A Collaborative Filtering Recommendation Engine for Node.js utilizing Redis
An easy-to-use collaborative filtering based recommendation engine and NPM module built on top of Node.js and Redis. The engine uses the Jaccard coefficient to determine the similarity between users and k-nearest-neighbors to create recommendations. This module is useful for anyone with users, a store of products/movies/items, and the desire to give their users the ability to like/dislike and receive recommendations based on similar users. Raccoon takes care of all the recommendation and rating logic. It can be paired with any database as it does not keep track of any user/item information besides a unique ID.
Updated for ES6.
If you enjoy using this module, please contribute by trying the benchmark repo and helping to optimize raccoon. Thanks! https://github.com/guymorita/benchmark_raccoon_movielens
npm install raccoon
Raccoon keeps track of the ratings and recommendations from your users. It does not need to store any meta data of the user or product aside from an id. To get started:
npm install raccoon
npm install redisredis-server
If remote or you need to customize the connection settings use the process.env variables:
const raccoon = ;
// these are the default values but you can change themraccoonconfignearestNeighbors = 5; // number of neighbors you want to compare a user againstraccoonconfigclassName = 'movie'; // prefix for your items (used for redis)raccoonconfignumOfRecsStore = 30; // number of recommendations to store per user
raccoon;// after a user likes an item, the rating data is immediately// stored in Redis in various sets for the user/item, then the similarity,// wilson score and recommendations are updated for that user.
raccoon;// available options are:updateRecs: false// this will stop the update sequence for this rating// and greatly speed up the time to input all the data// however, there will not be any recommendations at the end.// if you fire a like/dislike with updateRecs on it will only update// recommendations for that user.// default === true// options are available to liked, disliked, unliked, and undisliked.
raccoon;// removes the liked rating from all sets and updates. not the same as disliked.
raccoon;// negative rating of the item. if user1 liked movie1 and user2 disliked it, their// jaccard would be -1 meaning the have opposite preferences.
raccoon;// similar to unliked. removes the negative disliked rating as if it was never rated.
There are many ways to gauge the likeness of two users. The original implementation of recommendation Raccoon used the Pearson Coefficient which was good for measuring discrete values in a small range (i.e. 1-5 stars). However, to optimize for quicker calcuations and a simplier interface, recommendation Raccoon instead uses the Jaccard Coefficient which is useful for measuring binary rating data (i.e. like/dislike). Many top companies have gone this route such as Youtube because users were primarily rating things 4-5 or 1. The choice to use the Jaccard's instead of Pearson's was largely inspired by David Celis who designed Recommendable, the top recommendation engine on Rails. The Jaccard Coefficient also pairs very well with Redis which is able to union/diff sets of like/dislikes at O(N).
To deal with large user bases, it's essential to make optimizations that don't involve comparing every user against every other user. One way to deal with this is using the K-Nearest Neighbors algorithm which allows you to only compare a user against their 'nearest' neighbors. After a user's similarity is calculated with the Jaccard Coefficient, a sorted set is created which represents how similar that user is to every other. The top users from that list are considered their nearest neighbors. recommendation Raccoon uses a default value of 5, but this can easily be changed based on your needs.
If you've ever been to Amazon or another site with tons of reviews, you've probably ran into a sorted page of top ratings only to find some of the top items have only one review. The Wilson Score Interval at 95% calculates the chance that the 'real' fraction of positive ratings is at least x. This allows for you to leave off the items/products that have not been rated enough or have an abnormally high ratio. It's a great proxy for a 'best rated' list.
When combined with hiredis, redis can get/set at ~40,000 operations/second using 50 concurrent connections without pipelining. In short, Redis is extremely fast at set math and is a natural fit for a recommendation engine of this scale. Redis is integral to many top companies such as Twitter which uses it for their Timeline (substituted Memcached).
grunt testgrunt mochacov:coverage
For testing, raccoon uses Mocha Chai as a testing suite, automates it with Grunt.js and gets test coverage with Blanket.js/Travis-CI/Coveralls.