Light-weight sentence tokenizer for Chinese languages.
published 1.0.1 3 years agoLight-weight sentence tokenizer for Korean. Supports both full-width and half-width punctuation marks.
published 1.0.1 3 years agoLight-weight tool for normalizing whitespace, splitting lines, and accurately tokenizing words (no regex). Multiple natural languages supported.
published 1.0.3 2 years agoTool for stripping and normalizing punctuation and other non-alphanumeric characters. Supports multiple natural languages. Useful for scrapping, machine learning, and data analysis.
published 1.0.2 2 years agoLight-weight tool for converting characters in a string into common HTML entities (without regex).
published 1.0.2 2 years agoTool for escaping script tags using backslashes (no regex).
published 1.0.4 2 years ago