NodeJs is teaming with Naive Bayes Classifiers so before someone asks why we need another, let me explain why this classifier is different and why you must love it! :-)
Almost all naive classifiers out there save and consume their data in JSON format, allowing you to persist the data to file.
While this works for most cases, it is problematic when you want to train your classifier over several thousands of large documents. It becomes worse when you want to train persistently over a long time.
Imagine you were tracking BuzzFeed headlines and training your classifier to understand clickbait. Would it be convenient to train over a period of months using a JSON file that has to be loaded & held in memory?
What happens if your code exits unexpectedly on the millionth document just before you had persisted to disk?
Is this method of training sustainable and most of scalable?
So, turns out there's not a simple sql based Naive Bayes classifier out there. Know one? Show me please.
Actually, there are a few gists and examples but must are written for a specific dataset and their logic is often convoluted, involving copying this data to that temporary table and so on.
But Naive Bayes classifiers, in their simplest form are simple. All they need to know is which document goes into what class. The rest, really, is just arithmetic.
So this classifier implements a database schema that mimicks the JSON objects encoded with classes, documents and their respective counts.
Using simple, straightforward SQL, your database is atomically updated each time you classify a new document and the probabilities change automagically.
You will never need to load heavy files ever again, and because this is SQL(Lite), you can carry and plugin your data wherever you go!
Best of all, you can train whenever you come across new documents without affecting any ongoing classifications.
npm install bayes
var bayes = ;var path=;//Some Optionsvar options="dbPath":path //path to save database"dbName":'sentiment-db' //database name"stopwords":'en''sw' //stopwords to use. See for more"stemmer":'lancaster' //what stemmers do you want to use. Currently suppports 'lancaster' & 'porter' stemmers via ."returnProbabilities":3 //how many probabilities do you want returned. Important especially where you have many classes"trace":true //do you want log what's happening?;var classifier = ;// teach our classifier a few factsclassifierclassifierclassifier;// //must save docs...to commit data to database// Also, if it's the first time you are traing the classifier, then run categorize after data has been commitedclassifier;
Returns an instance of a Sqlite-Bayes Classifier.
Pass in an optional
options object to configure the instance. If you specify a
stemmer function in
options, it will be used as the instance's tokenizer. The default tokenizer removes punctuation and splits on spaces.
Teach your classifier what
text belongs to. The more you teach your classifier, the more reliable it becomes. It will use what it has learned to identify new documents that it hasn't seen before.
category it thinks
text belongs to. Its judgement is based on what you have taught it with .learn().