jskit-learn
Description
JSkit-learn is a javascript module with simple and efficient tools for data mining and data analysis in JavaScript. JSkit-learn can be used with ML.js, pandas-js, and numjs, to approximate the equialent R/Python tool chain in JavaScript.
In Python, data preperation is typically done in a DataFrame, jskit-learn encourages a more R like workflow where the data prepration is in it's native structure.
Installation
$ npm i jskit-learn
Full Documentation
Usage (basic)
"jskit-learn" : loadCSV: Function: loadCSV //asynchronously loads CSVs, either a filepath or a remote URI cross_validation: train_test_split: Function: train_test_split // splits data into training and testing sets cross_validation_split: Function: kfolds //splits data into k-folds preprocessing: Class DataSet: //class for manipulating an array of objects (typically from CSV data) // - returns a new array of a selected column from an array of objects, can filter, scale and replace values // - returns a new array of a selected column from an array of objects and replaces empty values, encodes values and scales values // - returns a new array and label encodes a selected column // - returns a new array and decodes an encoded column back to the original array values // - returns a new object of one hot encoded values // - returns a matrix of values from multiple columns // - returns a new array of a selected column that is passed a reducer function, this is used to create new columns for aggregate statistics // - returns a new column that is merged onto the data set // - filtered rows of data, // - mutates data property of DataSet by replacing multiple columns in a single command calc: getTransactions: Function getTransactions // Formats an array of transactions into a sparse matrix like format for Apriori/Eclat assocationRuleLearning: async Function assocationRuleLearning // returns association rule learning results using apriori util: range: Function // range helper function rangeRight: Function //range right helper function scale: Function: scale //scale / normalize data avg: Function: arithmeticMean // aritmatic mean mean: Function: arithmeticMean // aritmatic mean sum: Function: sum max: Function: max min: Function: min sd: Function: standardDeviation // standard deviation StandardScaler: Function: StandardScaler // standardization (z-scores) MinMaxScaler: Function: MinMaxScaler // min-max scaling ExpScaler: Function: ExpScaler // exponent scaling LogScaler: Function: LogScaler // natual log scaling squaredDifference: Function: squaredDifference // Returns an array of the squared different of two arrays standardError: Function: standardError // The standard error of the estimate is a measure of the accuracy of predictions made with a regression line coefficientOfDetermination: Function: coefficientOfDetermination // r^2 rSquared: Function: coefficientOfDetermination // alias for coefficientOfDetermination pivotVector: Function: pivotVector // returns an array of vectors as an array of arrays pivotArrays: Function: pivotArrays // returns a matrix of values by combining arrays into a matrix standardScore: Function: standardScore // Calculates the z score of each value in the sample, relative to the sample mean and standard deviation. zScore: Function: standardScore // alias for standardScore. approximateZPercentile: Function: approximateZPercentile // approximate the p value from a z score
Examples (JavaScript / Python / R)
Loading CSV Data
Javascript
;let dataset; //In JavaScript, by default most I/O Operations are asynchronous, see the notes section for morejsk ; // or from URLjsk
Python
import pandas as pd #Importing the dataset dataset =
R
# Importingd the datasetdataset = read.csv('Data.csv')
Handling Missing Data
Javascript
//column Array returns column of data by name// [ '44','27','30','38','40','35','','48','50', '37' ]const OringalAgeColumn = dataset; //column Replace returns new Array with replaced missing data//[ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ]const ReplacedAgeMeanColumn = dataset; //fit Columns, mutates datasetdataset;/*datasetclass DataSet data:[ { 'Country': 'Brazil', 'Age': '38.77777777777778', 'Salary': '72000', 'Purchased': 'N', } ... ]*/
Python
X = .valuesy = .values # Taking care of of missing data from sklearn.preprocessing import Imputerimputer = imputer = =
R
# Taking care of the missing datadataset$Age = ifelse(is.na(dataset$Age), ave(dataset$Age,FUN = function(x) mean(x,na.rm =TRUE)), dataset$Age)
One Hot Encoding and Label Encoding
Javascript
// [ 'Brazil','Mexico','Ghana','Mexico','Ghana','Brazil','Mexico','Brazil','Ghana', 'Brazil' ]const originalCountry = dataset; /*{ originalCountry: { Country_Brazil: [ 1, 0, 0, 0, 0, 1, 0, 1, 0, 1 ], Country_Mexico: [ 0, 1, 0, 1, 0, 0, 1, 0, 0, 0 ], Country_Ghana: [ 0, 0, 1, 0, 1, 0, 0, 0, 1, 0 ] }, }*/const oneHotCountryColumn = dataset; // [ 'N', 'Yes', 'No', 'f', 'Yes', 'Yes', 'false', 'Yes', 'No', 'Yes' ] const originalPurchasedColumn = dataset;// [ 0, 1, 0, 0, 1, 1, 1, 1, 0, 1 ]const encodedBinaryPurchasedColumn = dataset;// [ 0, 1, 2, 3, 1, 1, 4, 1, 2, 1 ]const encodedPurchasedColumn = dataset; // [ 'N', 'Yes', 'No', 'f', 'Yes', 'Yes', 'false', 'Yes', 'No', 'Yes' ] const decodedPurchased = dataset; //fit Columns, mutates datasetdataset;
Python
# Encoding categorical data from sklearn.preprocessing import LabelEncoder, OneHotEncoderlabelencoder_X = = onehotencoder = X = .labelencoder_y = y =
R
# Encoding categorical datadataset$Country = factor(dataset$Country, levels = c('Brazil', 'Mexico', 'Ghana'), labels = c(1, 2, 3)) dataset$Purchased = factor(dataset$Purchased, levels = c('No', 'Yes'), labels = c(0, 1))
Cross Validation
Javascript
const testArray = 20 25 10 33 50 42 19 34 90 23 ; // { train: [ 50, 20, 34, 33, 10, 23, 90, 42 ], test: [ 25, 19 ] }const trainTestSplit = jskcross_validation; // [ [ 50, 20, 34, 33, 10 ], [ 23, 90, 42, 19, 25 ] ] const crossValidationArrayKFolds = jskcross_validation;
Python
#splitting the dataset into trnaing set and test set from sklearn.cross_validation import train_test_splitX_train, X_test, y_train, y_test =
R
# Splitting the dataset into the training set and test setlibrary(caTools)set.seed(1)split = sample.split(dataset$Purchased, SplitRatio = 0.8)training_set = subset(dataset, split == TRUE)test_set = subset(dataset, split == FALSE)
Scaling (z-score / min-mix)
Javascript
dataset; dataset;
Python
from sklearn.preprocessing import StandardScalersc_X = X_train = X_test =
Development
Make sure you have grunt installed
$ npm i -g grunt-cli jsdoc-to-markdown
For generating documentation
$ grunt doc$ jsdoc2md src/**/*.js > docs/api.md
Notes
Check out https://github.com/repetere/jskit-learn for the full jskit-learn Documentation
A quick word about asynchronous JavaScript
Most machine learning tutorials in Python and R are not using their asynchronous equivalents; however, there is a bias in JavaScript to default to non-blocking operations.
With the advent of ES7 and Node.js 7+ there are syntax helpers with asynchronous functions. It may be easier to use async/await in JS if you want an approximation close to what a workflow would look like in R/Python
;; ;;;;const plt = mpnplot; void async { const csvData = await ; const rawData = csvData; const fittedData = rawData; const dataset = fittedData; const X = datasetvalues; const y = datasetvalues; console;};
Testing
$ npm i$ grunt test
Contributing
Fork, write tests and create a pull request!
Misc
As of Node 8, ES modules are still used behind a flag, when running natively as an ES module
$ node --experimental-modules my-machine-learning-script.mjs # Also there are native bindings that require Python 2.x, make sure if you're using Andaconda, you build with your Python 2.x bin $ npm i --python=/usr/bin/python
License
MIT