32 packages found
Corpus representaion stored in JSON and wrapped into Corpus CRUD API
日本語で書かれた技術書のコーパス
Spam Assassin public mail corpus.
The text of Moby Dick by Herman Melville.
State of the Union addresses by U.S. Presidents.
- stdlib
- datasets
- dataset
- data
- speeches
- politics
- usa
- us
- president
- sotu
- state of the union
- addresses
- text
- corpus
- View more
Text corpora from Project Gutenburg used by NLTK.
translate languages using a statistical model
A package that finds the frequency of a word per million words, using Chapter 1, List 1.2 from https://ucrel.lancs.ac.uk/bncfreq/flists.html as it's source of word frequency data.
A dashboard to visualize a synthesis on a structured corpus, using several charts (pie, histogram, ...)
A Node.js library for concordancing a corpus formatted according to the Data Format for Digital Linguistis (DaFoDiL)
Calculate how many documents contain a certain term, within a list (`Array`) of text documents.
A CJK text tokenizer
List of ~636,000 Spanish words
A JavaScript (Node.js) library that converts a tagged (monolinear) text to DLx JSON format
Merge multiple sentiment libraries for better sentiment analysis
List of ~336,000 French words
A wrapper for CETEMPúblico, an European Portuguese corpus of news extracts from the newspaper Público, with 180 million words tagged automatically using PALAVRAS.