31 packages found
Spam Assassin public mail corpus.
The text of Moby Dick by Herman Melville.
State of the Union addresses by U.S. Presidents.
- stdlib
- datasets
- dataset
- data
- speeches
- politics
- usa
- us
- president
- sotu
- state of the union
- addresses
- text
- corpus
- View more
A dashboard to visualize a synthesis on a structured corpus, using several charts (pie, histogram, ...)
A Node.js library for concordancing a corpus formatted according to the Data Format for Digital Linguistis (DaFoDiL)
Corpus representaion stored in JSON and wrapped into Corpus CRUD API
A JavaScript (Node.js) library that converts a tagged (monolinear) text to DLx JSON format
日本語で書かれた技術書のコーパス
A wrapper for CETEMPúblico, an European Portuguese corpus of news extracts from the newspaper Público, with 180 million words tagged automatically using PALAVRAS.
translate languages using a statistical model
Text corpora from Project Gutenburg used by NLTK.
This is a tool for converting srt file into plain-text corpus
Some classes to represent elements in a text corpus.
State of the Union addresses by U.S. Presidents as a UMD bundle.
- stdlib
- datasets
- dataset
- data
- speeches
- politics
- usa
- us
- president
- sotu
- state of the union
- addresses
- text
- corpus
- View more
Spam Assassin public mail corpus as a UMD bundle.
- stdlib
- datasets
- dataset
- data
- spam
- spam assassin
- ham
- text
- classification
- classifier
- corpus
- View more
The text of Moby Dick by Herman Melville as a UMD bundle.
A Standard Corpus of Present-Day Edited American English, for use with Digital Computers.