- Fetch pure HTML from a webserver and save it to disk
- Norch-indexer pushes documents into a norch.js search server
- A JSONified and simplified version of the famous reuters 21578 dataset
- A text search index module for Node.js. Search-index allows applications to add, delete and retrieve documents from a corpus. Retrieved documents are ordered by tf-idf relevance, filtering on metadata, and field weighting
- A module for node.js that takes in text and returns text that is stripped of stopwords
- A node.js module that creates a term vector from a mixed text input. Supports customisable stopwords and separators.