midas is a data enrichment platform that takes away the pain from enriching CSV, JSON and Excel files.
Navigation
- Website (https://midas.science)
- How it works
- Getting started
- Pipeline Definition
- Enricher
- Examples
How it works
midas makes it easy to enrich CSV, XLSX and JSON files with data from any API. The enrichment pipeline consists of 3 simple steps:
- Give your pipeline a name, e.g. WeatherEnrichment
- Define your source file, e.g. Cities.xlsx
- Create an enricher for the API you want to use (see "Getting Started")
- (optional) Define your target file, e.g. CitiesWithWeatherEnriched.xlsx
The extraction and parsing of the data, the handling of the individual API requests, the format conversion, e.g. from XLSX to JSON, as well as the loading of the data into your target is all handled by midas.
Getting started
-
Install the
midas
cli vianpm install -g midas-os
-
Initialize a new enrichment pipeline via
midas init
-
Follow the wizard
🧙 to create a new pipeline.- Give your pipeline a name, e.g. WeatherEnrichment
- Define the type of your source file (json, csv or xlsx)
- Tell midas where to find your source file. You should use the absolute path to the file here.
- Define the input parameter from your source that you want to the enricher to use via JSONPath. All data that matches your JSON path expression will be passed to the enricher. See http://goessner.net/articles/JsonPath/ for more information.
- (optional) Create a new enricher. An enricher in midas is a JavaScript class that contains the logic to call an external data source. Each enricher must contain a
process(inputData)
method whereinputData
will be a single data point from your source file that you specified earlier via JSON path. Within the enricher class you have access to request-promise viathis.rp
to make API calls.process(inputData)
must always return a Promise. Just take a look at our examples. - Chose a name for target property. The result of the Enricher
process()
call will be written here.
-
Start your pipeline via
midas enrich -c "{pipeline_name}_midas.json"
"use strict";
var Enricher = class Enricher {
//...
process(inputData) {
if (typeof inputData !== 'undefined' && inputData != null) {
this.inputData = inputData;
}
// Do stuff here
return Promise.resolve(this.inputData);
}
};
module.exports.Enricher = Enricher;
Pipeline Definition
In midas, a pipeline definition is a JSON file containing the name of the pipeline, a source, an optional target and an array of enrichers.
{
"name": "PIPELINE_NAME",
"source": {
"json": {
"path": "ABSOLUTE_PATH_TO_SOURCE_FILE/transactions.json"
}
},
"target": {
"json": {
"path": "ABSOLUTE_PATH_TO_TARGET_FILE/transactions.json"
}
},
"enrichers": [
{
"name": "NAME_OF_YOUR_ENRICHER",
"path": "ABSOLUTE_PATH_TO_YOUR_ENRICHER",
"config": {
"input_parameter": "JSON_PATH_EXPRESSION",
"target_property": "TARGET_PROPERTY_NAME"
}
}
]
}
Name
The name of the enrichment pipeline. This can be anything describtive.
"name": "PIPELINE_NAME"
Source
The source specifies the file that you want to enrich. Currently, midas supports CSV, JSON and XSLX files.
JSON
"source": {
"json": {
"path": "ABSOLUTE_PATH_TO_SOURCE_FILE.json"
}
}
CSV
"source": {
"csv": {
"path": "ABSOLUTE_PATH_TO_SOURCE_FILE.csv"
}
}
XLSX
"source": {
"xlsx": {
"path": "ABSOLUTE_PATH_TO_SOURCE_FILE.xlsx"
}
}
Target (optional)
The target is useful if you want to convert your source file into another format. If no target is specified, the target will be equal to the source. The notation is exactly the same as for the source.
JSON
"target": {
"json": {
"path": "ABSOLUTE_PATH_TO_TARGET_FILE.json"
}
}
CSV
"target": {
"csv": {
"path": "ABSOLUTE_PATH_TO_TARGET_FILE.json"
}
}
XLSX
"target": {
"xlsx": {
"path": "ABSOLUTE_PATH_TO_TARGET_FILE.xlsx"
}
}
Enrichers
Enrichers are JavaScript classes that are responsible for sending data to an external data source, transforming it and and passing it back to the midas core.
"enrichers": [
{
"name": "NAME_OF_YOUR_ENRICHER",
"path": "ABSOLUTE_PATH_TO_YOUR_ENRICHER",
"config": {
"input_parameter": "JSON_PATH_EXPRESSION",
"target_property": "TARGET_PROPERTY_NAME"
}
}
]
Each enricher requires a name (must be equal to it's filename), a path to it and a configuration that specifies the input parameter from the source and the property name of the target.
Additional Configuration (API Keys etc.)
Often, APIs require keys to authenticate and authorize requests. The config property of the enricher definition can be used to pass such data. For example:
"enrichers": [
{
"name": "NAME_OF_YOUR_ENRICHER",
"path": "ABSOLUTE_PATH_TO_YOUR_ENRICHER",
"config": {
"input_parameter": "JSON_PATH_EXPRESSION",
"target_property": "TARGET_PROPERTY_NAME",
"api_key": "MY_API_KEY"
}
}
]
Chaining
It is possible to chain multiple enrichers by passing them in an array.
"enrichers": [
{
"name": "NAME_OF_YOUR_ENRICHER_1",
"path": "ABSOLUTE_PATH_TO_YOUR_ENRICHER",
"config": {
"input_parameter": "JSON_PATH_EXPRESSION",
"target_property": "TARGET_PROPERTY_NAME",
"api_key": "MY_API_KEY"
}
},
{
"name": "NAME_OF_YOUR_ENRICHER_2",
"path": "ABSOLUTE_PATH_TO_YOUR_ENRICHER",
"config": {
"input_parameter": "JSON_PATH_EXPRESSION",
"target_property": "TARGET_PROPERTY_NAME",
"api_key": "MY_API_KEY"
}
}
]
Creating a new enricher
Any midas enricher is a JavaScript class wich is responsible for taking the input data from the source file, sending it to an external data source and returning the result as a Promise. The following code skeleton shows how such an enricher class is structured.
"use strict";
var Enricher = class Enricher {
constructor(rp, inputData, config) {
//npm request-promise is used for handling requests
//see: https://github.com/request/request-promise
this.rp = rp;
//loads inputData of the target file specified in the source object path in your config
this.inputData = inputData;
//loads config for this enrichment
this.config = config;
}
getConfig() {
return this.config;
}
getName() {
return 'Enricher';
}
setData(inputData) {
this.inputData = inputData;
}
process(inputData) {
if (typeof inputData != 'undefined' && inputData != null) {
this.inputData = inputData;
}
// Do stuff here
return Promise.resolve(this.inputData);
}
};
module.exports.Enricher = Enricher;
Call an API
Calling an API is by default done via request-promise (you can use any other way though). Let's check the following example where we use the Fixer.io API to enrich a file with currency exchange rates.
"use strict";
var Enricher = class Enricher {
constructor(rp, inputData, config) {
//npm request-promise is used for handling requests
//see: https://github.com/request/request-promise
this.rp = rp;
//loads inputData of the target file specified in the source object path in your config
this.inputData = inputData;
//loads config for this enrichment
this.config = config;
}
getConfig() {
return this.config;
}
getName() {
return 'Enricher';
}
setData(inputData) {
this.inputData = inputData;
}
process(inputData) {
if (typeof inputData !== 'undefined' && inputData != null) {
this.inputData = inputData;
}
//https://api.fixer.io/latest
let req_url = 'https://api.fixer.io/latest';
let options = {
uri: req_url,
qs: {
base: this.inputData,
},
headers: {
'User-Agent': 'midas'
},
json: true // Automatically parses the JSON string in the response
};
return this.rp(options).then((result) => {
let _result = [];
//console.log(result);
try {
_result = result.rates;
} catch (e) {
_result = null
}
//console.log(typeof _result);
return Promise.resolve(_result);
}).catch((err) => {
return Promise.resolve('ERROR HAPPENED');
});
}
};
module.exports.Enricher = Enricher;