learningjs, adding the support for random forests
C4.5 with random forest: This project is a fork fromData format
Input files need to be in CSV-format with 1st line being feature names. One of the features has to be called 'label'. E.g.
outlook, temp, humidity, wind, label text, real, text, text, feature_type 'Sunny',80,'High', 'Weak', 'No' 'Sunny',82,'High', 'Strong', 'No' 'Overcast',73,'High', 'Weak', 'Yes'
There's also an optional 2nd line for feature types and the 'label' column for 2nd line has to be called 'feature_type'. This is useful if feature types are mixed. For Logistic Regression, all features should be real numbers. E.g.
label,a,b,c,d,e,f,g,h,i,j,k,l,m 1,1,0.72694,1.4742,0.32396,0.98535,1,0.83592,0.0046566,0.0039465,0.04779,0.12795,0.016108,0.0052323 2,2,0.74173,1.5257,0.36116,0.98152,0.99825,0.79867,0.0052423,0.0050016,0.02416,0.090476,0.0081195,0.002708 3,3,0.76722,1.5725,0.38998,0.97755,1,0.80812,0.0074573,0.010121,0.011897,0.057445,0.0032891,0.00092068 1,4,0.73797,1.4597,0.35376,0.97566,1,0.81697,0.0068768,0.0086068,0.01595,0.065491,0.0042707,0.0011544
Usage
Data loading: data_util.js provides three methods:
loadTextFile
: the csv-format file will be loaded from disk and columns are parsed as strings unless 2nd line specifies feature types.loadRealFile
: the csv-format file will be loaded from disk and columns are parsed as real numbers.loadString
: a big string will be chopped into lines and columns are parsed as strings unless 2nd line specifies feature types.
In the loading callback function you will obtain a data object D on which you can apply the learning methods. Note that only Decision Tree supports both real and categorical features. Logistic Regression works on real features only.
var learningjs = ;var data_util = ;var tree = ;data_util;
Documentation
See learningjs, which is the original project for more information and for demo.
License
MIT