Configurable Naive Bayes Classifier for text with cross-validation support
Classify text, analyse sentiments, recognize user intents for chatbot using
wink-naive-bayes-text-classifier. It is a part of wink — a growing family of high quality packages for Statistical Analysis, Natural Language Processing and Machine Learning in NodeJS.
It's API offers a rich set of features:
- Configure text preparation task such as amplify negation, tokenize, stem, remove stop words, and propagate negation using wink-nlp-utils or any other package of your choice.
- Configure Lidstone or Lapalce additive smoothing.
- Configure Multinomial or Binarized Multinomial Naive Bayes model.
- Export and import learnings in JSON format that can be easily saved on hard-disk.
- Evaluate learning to perform n-fold cross validation.
- Obtain comprehensive metrics including confusion matrix, precision, and recall.
Use npm to install:
npm install wink-naive-bayes-text-classifier --save
// Load Naive Bayes Text Classifiervar Classifier = ;// Instantiatevar nbc = ;// Load NLP utilitiesvar nlp = ;// Configure preparation tasksnbc;// Configure behaviornbc;// Train!nbc;nbc;nbc;nbc;nbc;nbc;nbc;nbc;nbc;// Consolidate all the training!!nbc;// Start predicting...console;// -> autoloanconsole;// -> prepay
definePrepTasks( tasks )
Defines the text preparation
tasks to transform raw incoming text into an array of tokens required during
predict() operations. The
definePrepTasks returns the count of
As illustrated in the usage, wink-nlp-utils offers a rich set of such functions.
defineConfig( config )
Defines the configuration from the
config object. This object must define 2 properties viz. (a)
considerOnlyPresence must be a boolean — true indicates a binarized model; default value is false. The
smoothingFactor defines the value for additive smoothing; its default value is 1. The
defineConfig() must be called before attempting to learn.
learn( input, label )
Simply learns that the
input belongs to the
definePrepTasks() must be called before learning.
Consolidates the learning. It is a prerequisite for
evaluate( input, label )
It is used to evaluate the learning against a test data set. The
input is used to predict the label, which is compared with the
label to populate a confusion matrix.
It computes a detailed metrics consisting of macro-averaged precision, recall and f-measure along with their label-wise values and the confusion matrix.
predict( input )
Predicts the label for the
input. If it is unable to predict then it returns a value
computeOdds( input )
Computes the log base-2 of odds of every label for the
input; and returns the array of
[ label, odds ] in descending
odds. Here is an example of the returned array:
'prepay' 6169686751688911'autoloan' -6169686751688911
If it is unable to make prediction then it returns a value
[ [ 'unknown', 0 ] ].
The learning can be exported as JSON text that may be saved in a file.
importJSON( json )
An existing JSON learning can be imported for prediction. It is essential to
consolidate() before attempting to predict.
Returns basic stats of learning in terms of count of samples under each label, total words, and the size of vocabulary.
It completely resets the classifier by re-initializing all the learning related variables, except the preparatory tasks. It is useful during cross fold-validation.
If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a pull request.
Copyright & License
wink-naive-bayes-text-classifier is copyright 2017 GRAYPE Systems Private Limited.
It is licensed under the under the terms of the GNU Affero General Public License as published by the Free Software Foundation, version 3 of the License.