Text Preprocessor
Normalizing texts before any natural language processing
Instalation
Using Yarn:
yarn add text-preprocessor
Or using NPM:
npm i --save text-preprocessor
Usage
const preprocessor = ; const text = ; text ; console;// OUTPUT: "that is great! & but do not take too long okay? bjork-yo"
TextPreprocessor
preprocessor(text) ⇒ Constructs a TextPreprocessor instance
Param | Type |
---|---|
text | String |
Methods
- TextPreprocessor
- new TextPreprocessor(text)
- .clean()
- .unescape()
- .toLowerCase()
- .toString()
- .expandContractions()
- .killUnicode()
- .replace(regexp, value)
- .remove(regexp)
- .removeTagsAndMentions()
- .removeSpecialCharachters()
- .removeURLs()
- .removeParenthesesContents()
- .removePunctuation()
- .normalizeSingleCurlyQuotes()
- .normalizeDoubleCurlyQuotes()
- .defaults()
- .chain()
new TextPreprocessor(text)
Normalizing texts before any natural language processing
Param | Type |
---|---|
text | string |
textPreprocessor.clean()
and strips extra whitespace from all documents, leaving only at most one whitespace between any two other characters.
Kind: instance method of TextPreprocessor
textPreprocessor.unescape()
Converts the HTML entities &, <, >, ", and ' in string to their corresponding characters.
Kind: instance method of TextPreprocessor
textPreprocessor.toLowerCase()
Converts all the alphabetic characters in a string to lowercase.
Kind: instance method of TextPreprocessor
textPreprocessor.toString()
returns the result of chains so far
Kind: instance method of TextPreprocessor
textPreprocessor.expandContractions()
Replaces all occuring English contractions by their expanded equivalents, e.g. "don't" is changed to "do not".
Kind: instance method of TextPreprocessor
textPreprocessor.killUnicode()
Replaces hugely-ignorant, and widely subjective transliteration of latin, cryllic, greek unicode characters with english ascii.
Kind: instance method of TextPreprocessor
textPreprocessor.replace(regexp, value)
Replaces any occurrence of the given expression with the givven string
Kind: instance method of TextPreprocessor
Param | Type |
---|---|
regexp | RegExp |
value | String |
textPreprocessor.remove(regexp)
Removes any occurrence of the given expression
Kind: instance method of TextPreprocessor
Param | Type |
---|---|
regexp | RegExp |
textPreprocessor.removeTagsAndMentions()
Removes #tags, @mentions from start of the text
Kind: instance method of TextPreprocessor
textPreprocessor.removeSpecialCharachters()
Removes all special charachters
Kind: instance method of TextPreprocessor
textPreprocessor.removeURLs()
Removes Urls and emails
Kind: instance method of TextPreprocessor
textPreprocessor.removeParenthesesContents()
Remove brackets and parentheses contents.
Kind: instance method of TextPreprocessor
Example
`Hello, this is Mike (example)` to `Hello, this is Mike `
textPreprocessor.removePunctuation()
Removes punctuation from end of the text
Kind: instance method of TextPreprocessor
textPreprocessor.normalizeSingleCurlyQuotes()
Coerce single curly quotes. don’t
to don't
Kind: instance method of TextPreprocessor
textPreprocessor.normalizeDoubleCurlyQuotes()
Coerce double curly quotes. it is «Khorzu”
to it is "Khorzu"
Kind: instance method of TextPreprocessor
textPreprocessor.defaults()
clean
,toLowerCase
,unescape
,killUnicode
and normalizeSingleCurlyQuotes
Kind: instance method of TextPreprocessor
textPreprocessor.chain()
Executes chain of givven method names
Kind: instance method of TextPreprocessor
TextPreprocessor
preprocessor(text) ⇒ Normalizing texts before any natural language processing
Kind: global function
Param | Type |
---|---|
text | String |