data-projector
Load CSV datasets and map / transform the data for use.
- Load CSV datasets
- Guess types and cast data fields to types
- Calculate stats: global, fields and pairwise (field by field correlations etc)
- Map datasets to other datasets using transform functions
This is designed to take a specification object (JSON) and load a dataset and optionally map values to requested ranges.
The JSON specification objects can be saved in your application for use in presets.
Status: ALPHA
// Simple example with only the default statistics and no field mappings requested.const dp = ; const datasetPath = '/Users/crucial/code/idmx/playsplom/app/vendor/datasets/Nightingale.csv';const functions = {};const calculateStatsParams = {};const mapFieldsParams = {}; const x = dp; x;
"data": "0": 1 "Date": "1854-04-01" "Month": "Apr" "Year": 1854 "Army": 8571 "Disease": 1 "Wounds": 0 "Other": 5 "Disease.rate": 14 "Wounds.rate": 0 "Other.rate": 7 "0": 2 "Date": "1854-05-01" "Month": "May" "Year": 1854 "Army": 23333 "Disease": 12 "Wounds": 0 "Other": 9 "Disease.rate": 62 "Wounds.rate": 0 "Other.rate": 46 // etc etc "fields": "0" // first column header did not have a name "Date" "Month" "Year" "Army" "Disease" "Wounds" "Other" "Disease.rate" "Wounds.rate" "Other.rate" "path": "/Users/crucial/code/idmx/playsplom/app/vendor/datasets/Nightingale.csv" "stats": "fields": "0": "minval": 1 "maxval": 24 "type": "type": "number" "null": false "Date": "minval": null // bug: didn't calculate minval max for the date column "maxval": null "type": "type": "date" "null": false "dateFormat": "YYYY-MM-DD" // auto-detected the date format "Month": "minval": null "maxval": null "type": "type": "string" "null": false "Year": "minval": 1854 "maxval": 1856 "type": "type": "number" "null": false "Army": "minval": 8571 "maxval": 47751 "type": "type": "number" "null": false "Disease": "minval": 1 "maxval": 2761 "type": "type": "number" "null": false "Wounds": "minval": 0 "maxval": 287 "type": "type": "number" "null": false "Other": "minval": 5 "maxval": 361 "type": "type": "number" "null": false "Disease.rate": "minval": 14 "maxval": 10228 "type": "type": "number" "null": false "Wounds.rate": "minval": 0 "maxval": 1158 "type": "type": "number" "null": false "Other.rate": "minval": 25 "maxval": 1401 "type": "type": "number" "null": false ## API <a name="project"></a> ## ⇒ <code>Object</code>Load a dataset from disk calculate statistics and apply transformations **Returns**: <code>Object</code> - Dataset | Param | Type | Description || --- | --- | --- || functions | <code>Object</code> | Named ⇒ <code>Promise.<Object></code>Load and parse a dataset from path.Stats are not yet calculated so types are unknownand all fields are strings. **Returns**: <code>Promise.<Object></code> - Promise for a dataset | Param | Type | Description || --- | --- | --- || path | <code>String</code> | Absolute path to file | <a name="loadDataset"></a> ## loadDataset(path, functions, calculateStatsParams) ⇒ <code>Promise.<Object></code>Load and parse a dataset and calculate stats and coerce types of field values. **Returns**: <code>Promise.<Object></code> - Promise for a dataset | Param | Type | Description || --- | --- | --- || path | <code>String</code> | Absolute path to file || functions | <code>Object</code> | Named function registery || calculateStatsParams | <code>Object</code> | The `stats` object from params | <a name="createDataset"></a> ## createDataset(data, fields, path) ⇒ <code>Object</code>Create a dataset object from an array of objects **Returns**: <code>Object</code> - dataset - {data fields path} | Param | Type | Description || --- | --- | --- || data | <code>Object</code> | field: value field2: value ... || fields | <code>Array<String></code> | Field names || path | <code>String</code> | | <a name="_calculateStats"></a> ## ⇒ <code>Object</code>Calculate for a dataset using a stats specification **Returns**: <code>Object</code> - stats | Param | Type | Description || --- | --- | --- || functions | <code>Object</code> | Named ⇒ <code>Object</code>Calculate statistics and return a new dataset objects with .stats set **Returns**: <code>Object</code> - dataset | Param | Type | Description || --- | --- | --- || functions | <code>Object</code> | Named function registery || calculateStatsParams | <code>Object</code> | || dataset | <code>Object</code> | | <a name="castTypes"></a> ## castTypes(dataset) ⇒ <code>Object</code>Having guessed types with calculateStats, cast all fields to the guessed types. - This converts '1.1' to 1.1- Enums of strings to their integer indices- Date strings to Date objects- String fields with high cardinality remain strings **Returns**: <code>Object</code> - Dataset object with values cast to guessed types | Param | Type | Description || --- | --- | --- || dataset | <code>Object</code> | Dataset object | <a name="mapFields"></a> ## mapFields(functions, mapFieldsParams, dataset)mapFields Map input fields to output fields using mapping functions as specified inmapFieldsParams ```js{ input: 'inFieldName' output: 'outFieldName' fn: 'linear' // named function in functions registry args: 0 1 // parameters for linear mapping function}
fn may be a String key to a function in the functions registery or a function(stats, fieldName, [...args], value)
Param | Type | Description |
---|---|---|
functions | Object |
Named function registery |
mapFieldsParams | Array.<Object> |
|
dataset | Object |
function
makeMapFunction(functions, stats, mapParam) ⇒ makeMapFunction from mapParam
mapParam: .fn .args
Where fn is a Function or a String key to lookup Function in functions
Function should accept: (stats, fieldName, ...args, value)
Args are optional array of params to configure your mapping function. eg. [minval, maxval]
This curries the function and calls it with: (stats, fieldName, ...args) and returns that mapping function which accepts just value and returns the mapped value.
Returns: function
- any => any
Param | Type | Description |
---|---|---|
functions | Object |
Named function registery |
stats | Object |
|
mapParam | Object |
Object
getRow(dataset, fields) ⇒ Get a single row as an Object.
As this function is curried you can bake in dataset and fields:
getter = ; // returns a function with first two args satisfied; // get row 12
Returns: Object
- - The object for this row.
Param | Type | Description |
---|---|---|
dataset | Object |
|
fields | Array.<string> | null |
Optionally select just the fields you need. null selects all fields. |
mixed
getCell(dataset, field, index) ⇒ Get a single data value (row, column)
As this function is curried you can bake in dataset and field:
getter = ; ; // get value at row 12, field 'sepalLength'
Returns: mixed
- - The value for this cell.
Param | Type | Description |
---|---|---|
dataset | Object |
|
field | String |
key of the field to select |
index | Number |
integer index of row |
Array.<mixed>
getColumn(dataset, field) ⇒ Get all values for a column
As this function is curried you can bake in dataset:
getter = ; ; // get the array of values for the sepalLength field
Returns: Array.<mixed>
- - Array of values for this field
Param | Type | Description |
---|---|---|
dataset | Object |
|
field | String |
key of the field to select |