Northern Pileated Marmoset

    morpheme-splitter-np

    0.1.2 • Public • Published

    Morpheme Splitter - Nepali

    Script adapted from this code in ipython notebook.

    Nepali words are composed of various morphemes which can be broadly divided into two categories: Vowels and Consonants. A given word can be resolved into its morphemes by some elementary rules. While these rules are relatively straightforward, the unicode representation make it a little bit non-trivial to work with. Consider these scenarios:

    • क is actually a single character in Unicode, while it is two morphemes, क् + अ in Nepali.
    • क + ् in Unicode representation translates to क्, a single morpheme in Nepali.
    • क + ि in Unicode representation translates to क् + इ in Nepali.

    In this script, we define rules for the separation of morphemes in Nepali Unicode representation. This shall serve as a building block as we later construct systems for separating syllables from multi-syllables words in Nepali.

    Rules

    • If any character is a vowel, leave it as it is
    • If any character is a single unicode consonant क - ह
      • If this is a last letter, the previous character as well as this character make two morpheme, where the latter is the independent vowel अ.
      • If next character is a halanta u(्), the previous character is a single morpheme.
      • If next character is a vowel, the previous character as well as this vowel make two morphemes (क् + ि).
      • If next character is a consonant, the previous character as well as this character make two morpheme, where the latter is the independent vowel अ.

    License

    MIT License

    Copyright

    Install

    npm i morpheme-splitter-np

    DownloadsWeekly Downloads

    4

    Version

    0.1.2

    License

    ISC

    Unpacked Size

    14.2 kB

    Total Files

    9

    Last publish

    Collaborators

    • dineshdb