rairye-nlp


  • Light-weight sentence tokenizer for Japanese.

    published 1.0.2 2 years ago
  • Light-weight sentence tokenizer for Chinese languages.

    published 1.0.1 3 years ago
  • Light-weight sentence tokenizer for Korean. Supports both full-width and half-width punctuation marks.

    published 1.0.1 3 years ago
  • Light-weight tool for normalizing whitespace, splitting lines, and accurately tokenizing words (no regex). Multiple natural languages supported.

    published 1.0.3 2 years ago
  • Tool for stripping and normalizing punctuation and other non-alphanumeric characters. Supports multiple natural languages. Useful for scrapping, machine learning, and data analysis.

    published 1.0.2 2 years ago
  • Light-weight tool for converting characters in a string into common HTML entities (without regex).

    published 1.0.2 2 years ago
  • Tool for escaping script tags using backslashes (no regex).

    published 1.0.4 2 years ago