32 packages found
Corpus representaion stored in JSON and wrapped into Corpus CRUD API
Spam Assassin public mail corpus.
The text of Moby Dick by Herman Melville.
State of the Union addresses by U.S. Presidents.
Text corpora from Project Gutenburg used by NLTK.
translate languages using a statistical model
A package that finds the frequency of a word per million words, using Chapter 1, List 1.2 from https://ucrel.lancs.ac.uk/bncfreq/flists.html as it's source of word frequency data.
A dashboard to visualize a synthesis on a structured corpus, using several charts (pie, histogram, ...)
A Node.js library for concordancing a corpus formatted according to the Data Format for Digital Linguistis (DaFoDiL)
Calculate how many documents contain a certain term, within a list (`Array`) of text documents.
A CJK text tokenizer
List of ~636,000 Spanish words
Merge multiple sentiment libraries for better sentiment analysis
List of ~336,000 French words
A wrapper for CETEMPúblico, an European Portuguese corpus of news extracts from the newspaper Público, with 180 million words tagged automatically using PALAVRAS.