ModelScript
Description
ModelScript is a javascript module with simple and efficient tools for data mining and data analysis in JavaScript. ModelScript can be used with ML.js, pandas-js, and numjs, to approximate the equivalent R/Python tool chain in JavaScript.
In Python, data preparation is typically done in a DataFrame, ModelScript encourages a more R like workflow where the data preparation is in it's native structure.
Installation
$ npm i modelscript
Full Documentation
Usage (basic)
ModelScript is an EcmaScript module and designed to be imported in an ES2015+ environment. In order to use in older environment, please use const modelscript = require('modelscript/build/modelscript.cjs.js')
for older versions of node and <script type="text/javascript" src=".../path/to/.../modelscript/build/modelscript.umd.js"/>
"modelscript" : ml: //see https://github.com/mljs/ml UpperConfidenceBound Class: UpperConfidenceBound // Implementation of the Upper Confidence Bound algorithm //returns next action based off of the upper confidence bound //single step training method //training method for upper confidence bound calculations ThompsonSampling Class: ThompsonSampling //Implementation of the Thompson Sampling algorithm //returns next action based off of the thompson sampling //single step training method //training method for thompson sampling calculations nlp: //see https://github.com/NaturalNode/natural ColumnVectorizer Class: ColumnVectorizer //class creating sparse matrices from a corpus // Returns a distinct array of all tokens after fit_transform //Returns array of arrays of strings for dependent features from sparse matrix word map //Fits and transforms data by creating column vectors (a sparse matrix where each row has every word in the corpus as a column and the count of appearances in the corpus) //Returns limited sets of dependent features or all dependent features sorted by word count //returns word map with counts //returns new matrix of words with counts in columns csv: loadCSV: Function: loadCSV //asynchronously loads CSVs, either a filepath or a remote URI loadTSV: Function: loadTSV //asynchronously loads TSVs, either a filepath or a remote URI model_selection: train_test_split: Function: train_test_split // splits data into training and testing sets cross_validation_split: Function: kfolds //splits data into k-folds cross_validate_score: Function: cross_validate_score//test model variance and bias grid_search: Function: grid_search // tune models with grid search for optimal performance DataSet Class: DataSet: //class for manipulating an array of objects (typically from CSV data) //returns a matrix of values by combining column arrays into a matrix // - returns a new array of a selected column from an array of objects, can filter, scale and replace values // - returns a new array of a selected column from an array of objects and replaces empty values, encodes values and scales values // - returns a new array of scaled values which can be reverse (descaled). The scaling transformations are stored on the DataSet // - Returns a new array of descaled values //returns a list of objects with only selected columns as properties // - returns a new array and label encodes a selected column // - returns a new array and decodes an encoded column back to the original array values // - returns a new object of one hot encoded values // - returns a matrix of values from multiple columns // - returns a new array of a selected column that is passed a reducer function, this is used to create new columns for aggregate statistics // - returns a new column that is merged onto the data set // - filtered rows of data, // - mutates data property of DataSet by replacing multiple columns in a single command static // returns an array of objects by applying labels to matrix of columns static // returns an array of objects by applying labels to column vector calc: getTransactions: Function getTransactions // Formats an array of transactions into a sparse matrix like format for Apriori/Eclat assocationRuleLearning: async Function assocationRuleLearning // returns association rule learning results using apriori util: range: Function // range helper function rangeRight: Function //range right helper function scale: Function: scale //scale / normalize data avg: Function: arithmeticMean // aritmatic mean mean: Function: arithmeticMean // aritmatic mean sum: Function: sum max: Function: max min: Function: min sd: Function: standardDeviation // standard deviation StandardScalerTransforms: Function: StandardScalerTransforms // returns two functions that can standard scale new inputs and reverse scale new outputs MinMaxScalerTransforms: Function: MinMaxScalerTransforms // returns two functions that can mix max scale new inputs and reverse scale new outputs StandardScaler: Function: StandardScaler // standardization (z-scores) MinMaxScaler: Function: MinMaxScaler // min-max scaling ExpScaler: Function: ExpScaler // exponent scaling LogScaler: Function: LogScaler // natual log scaling squaredDifference: Function: squaredDifference // Returns an array of the squared different of two arrays standardError: Function: standardError // The standard error of the estimate is a measure of the accuracy of predictions made with a regression line coefficientOfDetermination: Function: coefficientOfDetermination adjustedCoefficentOfDetermination: Function: adjustedCoefficentOfDetermination adjustedRSquared: Function: adjustedCoefficentOfDetermination rBarSquared: Function: adjustedCoefficentOfDetermination r: Function: coefficientOfCorrelation coefficientOfCorrelation: Function: coefficientOfCorrelation rSquared: Function: rSquared //r^2 pivotVector: Function: pivotVector // returns an array of vectors as an array of arrays pivotArrays: Function: pivotArrays // returns a matrix of values by combining arrays into a matrix standardScore: Function: standardScore // Calculates the z score of each value in the sample, relative to the sample mean and standard deviation. zScore: Function: standardScore // alias for standardScore. approximateZPercentile: Function: approximateZPercentile // approximate the p value from a z score preprocessing: DataSet: Class DataSet
Examples (JavaScript / Python / R)
Loading CSV Data
Javascript
;let dataset; //In JavaScript, by default most I/O Operations are asynchronous, see the notes section for morems ; // or from URLms
Python
import pandas as pd #Importing the dataset dataset =
R
# Importingd the datasetdataset = read.csv('Data.csv')
Handling Missing Data
Javascript
//column Array returns column of data by name// [ '44','27','30','38','40','35','','48','50', '37' ]const OringalAgeColumn = dataset; //column Replace returns new Array with replaced missing data//[ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ]const ReplacedAgeMeanColumn = dataset; //fit Columns, mutates datasetdataset;/*datasetclass DataSet data:[ { 'Country': 'Brazil', 'Age': '38.77777777777778', 'Salary': '72000', 'Purchased': 'N', } ... ]*/
Python
X = .valuesy = .values # Taking care of of missing data from sklearn.preprocessing import Imputerimputer = imputer = =
R
# Taking care of the missing datadataset$Age = ifelse(is.na(dataset$Age), ave(dataset$Age,FUN = function(x) mean(x,na.rm =TRUE)), dataset$Age)
One Hot Encoding and Label Encoding
Javascript
// [ 'Brazil','Mexico','Ghana','Mexico','Ghana','Brazil','Mexico','Brazil','Ghana', 'Brazil' ]const originalCountry = dataset; /*{ originalCountry: { Country_Brazil: [ 1, 0, 0, 0, 0, 1, 0, 1, 0, 1 ], Country_Mexico: [ 0, 1, 0, 1, 0, 0, 1, 0, 0, 0 ], Country_Ghana: [ 0, 0, 1, 0, 1, 0, 0, 0, 1, 0 ] }, }*/const oneHotCountryColumn = dataset; // [ 'N', 'Yes', 'No', 'f', 'Yes', 'Yes', 'false', 'Yes', 'No', 'Yes' ]const originalPurchasedColumn = dataset;// [ 0, 1, 0, 0, 1, 1, 1, 1, 0, 1 ]const encodedBinaryPurchasedColumn = dataset;// [ 0, 1, 2, 3, 1, 1, 4, 1, 2, 1 ]const encodedPurchasedColumn = dataset;// [ 'N', 'Yes', 'No', 'f', 'Yes', 'Yes', 'false', 'Yes', 'No', 'Yes' ]const decodedPurchased = dataset; //fit Columns, mutates datasetdataset;
Python
# Encoding categorical data from sklearn.preprocessing import LabelEncoder, OneHotEncoderlabelencoder_X = = onehotencoder = X = .labelencoder_y = y =
R
# Encoding categorical datadataset$Country = factor(dataset$Country, levels = c('Brazil', 'Mexico', 'Ghana'), labels = c(1, 2, 3)) dataset$Purchased = factor(dataset$Purchased, levels = c('No', 'Yes'), labels = c(0, 1))
Cross Validation
Javascript
const testArray = 20 25 10 33 50 42 19 34 90 23 ; // { train: [ 50, 20, 34, 33, 10, 23, 90, 42 ], test: [ 25, 19 ] }const trainTestSplit = mscross_validation; // [ [ 50, 20, 34, 33, 10 ], [ 23, 90, 42, 19, 25 ] ] const crossValidationArrayKFolds = mscross_validation;
Python
#splitting the dataset into trnaing set and test set from sklearn.cross_validation import train_test_splitX_train, X_test, y_train, y_test =
R
# Splitting the dataset into the training set and test setlibrary(caTools)set.seed(1)split = sample.split(dataset$Purchased, SplitRatio = 0.8)training_set = subset(dataset, split == TRUE)test_set = subset(dataset, split == FALSE)
Scaling (z-score / min-mix)
Javascript
dataset; dataset;
Python
from sklearn.preprocessing import StandardScalersc_X = X_train = X_test =
Development
Make sure you have grunt installed
$ npm i -g grunt-cli jsdoc-to-markdown
For generating documentation
$ grunt doc$ jsdoc2md src/**/*.js > docs/api.md
Notes
Check out https://repetere.github.io/modelscript for the full modelscript Documentation
A quick word about asynchronous JavaScript
Most machine learning tutorials in Python and R are not using their asynchronous equivalents; however, there is a bias in JavaScript to default to non-blocking operations.
With the advent of ES7 and Node.js 7+ there are syntax helpers with asynchronous functions. It may be easier to use async/await in JS if you want an approximation close to what a workflow would look like in R/Python
;; ;;;;const plt = mpnplot; void async { const csvData = await ; const rawData = csvData; const fittedData = rawData; const dataset = fittedData; const X = datasetvalues; const y = datasetvalues; console;};
Testing
$ npm i$ grunt test
Contributing
Fork, write tests and create a pull request!
Misc
As of Node 8, ES modules are still used behind a flag, when running natively as an ES module
$ node --experimental-modules my-machine-learning-script.mjs# Also there are native bindings that require Python 2.x, make sure if you're using Anaconda, you build with your Python 2.x bin $ npm i --python=/usr/bin/python
License
MIT