Content Based Recommender
After the recommender is trained by an array of documents, it can tell the list of documents which are more similar to the input document.
The training process involves 3 main steps:
- content pre-processing, such as html tag stripping, stopwords removal and stemming
- document vectors formation using tf-idf
- find the cosine similarity scores between all document vectors
Special thanks to the library natural helps a lot by providing a lot of NLP functionalities, such as tf-idf and word stemming.
I haven't tested how this recommender is performing with a large dataset. I will share more results after some more testing.
npm install content-based-recommender
And then import the ContentBasedRecommender class
const ContentBasedRecommender =
trainBidirectional(collectionA, collectionB)to allow recommendations between two different datasets
Upgrade dependencies to fix security alerts
Introduce the use of unigram, bigrams and trigrams when constructing the word vector
Simplify the implementation by not using sorted set data structure to store the similar documents data. Also support the maxSimilarDocuments and minScore options to save memory used by the recommender.
Update to newer version of vector-object
This example shows how to automatically match posts with related tags
To create the recommender instance
- options (optional): an object to configure the recommender
- maxVectorSize - to control the max size of word vector after tf-idf processing. A smaller vector size will help training performance while not affecting recommendation quality. Defaults to be 100.
- minScore - the minimum score required to meet to consider it is a similar document. It will save more memory by filtering out documents having low scores. Allowed values range from 0 to 1. Default is 0.
- debug - show progress messages so can monitor the training progress
To tell the recommender about your documents and then it will start training itself.
- documents - an array of object, with fields id and content
Works like the normal train function, but it creates recommendations between two different collections instead of within one collection.
getSimilarDocuments(id, [start], [size])
To get an array of similar items with document id
- id - the id of the document
- start - the start index, inclusive. Default to be 0
- size - the max number of similar documents to obtain. If it is omitted, the whole list after start index will be returned
It returns an array of objects, with fields id and score (ranging from 0 to 1)
To export the recommender as json object.
const recommender = ;recommender;const object = recommender;//can save the object to disk, database or otherwise
To update the recommender by importing from a json object, exported by the export() method
const recommender = ;recommender; // object can be loaded from disk, database or otherwise
npm installnpm run test