morpheme-splitter-np

0.1.2 • Public • Published

Morpheme Splitter - Nepali

Script adapted from this code in ipython notebook.

Nepali words are composed of various morphemes which can be broadly divided into two categories: Vowels and Consonants. A given word can be resolved into its morphemes by some elementary rules. While these rules are relatively straightforward, the unicode representation make it a little bit non-trivial to work with. Consider these scenarios:

  • क is actually a single character in Unicode, while it is two morphemes, क् + अ in Nepali.
  • क + ् in Unicode representation translates to क्, a single morpheme in Nepali.
  • क + ि in Unicode representation translates to क् + इ in Nepali.

In this script, we define rules for the separation of morphemes in Nepali Unicode representation. This shall serve as a building block as we later construct systems for separating syllables from multi-syllables words in Nepali.

Rules

  • If any character is a vowel, leave it as it is
  • If any character is a single unicode consonant क - ह
    • If this is a last letter, the previous character as well as this character make two morpheme, where the latter is the independent vowel अ.
    • If next character is a halanta u(्), the previous character is a single morpheme.
    • If next character is a vowel, the previous character as well as this vowel make two morphemes (क् + ि).
    • If next character is a consonant, the previous character as well as this character make two morpheme, where the latter is the independent vowel अ.

License

MIT License

Copyright

Dependencies (0)

    Dev Dependencies (4)

    Package Sidebar

    Install

    npm i morpheme-splitter-np

    Weekly Downloads

    1

    Version

    0.1.2

    License

    ISC

    Unpacked Size

    14.2 kB

    Total Files

    9

    Last publish

    Collaborators

    • dineshdb