node-data-preprocessing
A node package for data preprocessing.
The package exposes the individual steps, as well as one to the entire process.
Individual Steps
csvParser
var data = process;
options
List of options with defaults -
var options = path: '';
path
String - The path to the data.
extract
var extracted = process;
options
List of options with defaults -
var options = useHeaders: 'true';
useHeaders
Boolean - Indicates whether the first row of data is the heading or not. Note - The heading will not be used in the process. Setting it to true simply strips the first row from the data.
cleanse
var cleansed = process;
options
List of options with defaults -
var options = formats: ranges: ;
formats
Array - of strings representing the formats that the fields should be. The string should match the result of typeof()
applied to the expected data format.
ranges
Array - of objects, such that { 'validatorName': 'validatorValue' }.
validators
Available validators -
- greater - expects
value, min
, returnsvalue > min
; - greaterOrEqual - expects
value, min
, returnsvalue >= min
; - less - expects
value, max
, returnsvalue > max
; - lessOrEqual - expects
value, max
, returnsvalue > max
; - between - expects
value, range
, where range is a string such that'min-max'
, and returnsgreater(value, min) && less(value, max)
; - betweenOrEqual - expects
value, range
, where range is a string such that'min-max'
, and returnsgreaterOrEqual(value, min) && lessOrEqual(value, max)
;
standardise
var standardised = process;
options
List of options with defaults -
var options = min: 01 max: 09 standardisationMethod: 'default';
min
number - The minimum value for the standardisation.
max
number - The maximum value for the standardisation.
standardisationMethod
string - Can be default
, normal
or ss
(Sum of Squares).
ignore
Array - of integers representing columns of the data to ignore while standardising. They will retain their non-standardised values.
divide
var divided = process;
options
List of options with defaults -
var options = split: 60 20 20;
split
Array - Indicates how many subsets the data should be split into, and with what weighting.
process (combined)
var result = process;
options
The combined proces takes all the options that the individual steps take, in one object.
var options = path: '' useHeaders: true formats: min: 01 max: 09 standardisationMethod: 'default' split: 60 20 20;