17 packages found
A Language Independent 'word finding' tool, useful for stemming, tokenizing, indexing, spell checking and other common NLP tasks. Works on any human language and any unicode character set, learns from the data you give it. (Uses compression, maximum entro
This module covers some basic nlp principles and implementations. Every implementation in this module is written as stream to only hold that data in memory that is currently processed at any step.
- frequency distribution
- cross validation
- term frequency