morpheme-splitter-np

0.1.2 • Public • Published

Morpheme Splitter - Nepali

Script adapted from this code in ipython notebook.

Nepali words are composed of various morphemes which can be broadly divided into two categories: Vowels and Consonants. A given word can be resolved into its morphemes by some elementary rules. While these rules are relatively straightforward, the unicode representation make it a little bit non-trivial to work with. Consider these scenarios:

  • क is actually a single character in Unicode, while it is two morphemes, क् + अ in Nepali.
  • क + ् in Unicode representation translates to क्, a single morpheme in Nepali.
  • क + ि in Unicode representation translates to क् + इ in Nepali.

In this script, we define rules for the separation of morphemes in Nepali Unicode representation. This shall serve as a building block as we later construct systems for separating syllables from multi-syllables words in Nepali.

Rules

  • If any character is a vowel, leave it as it is
  • If any character is a single unicode consonant क - ह
    • If this is a last letter, the previous character as well as this character make two morpheme, where the latter is the independent vowel अ.
    • If next character is a halanta u(्), the previous character is a single morpheme.
    • If next character is a vowel, the previous character as well as this vowel make two morphemes (क् + ि).
    • If next character is a consonant, the previous character as well as this character make two morpheme, where the latter is the independent vowel अ.

License

MIT License

Copyright

Readme

Keywords

Package Sidebar

Install

npm i morpheme-splitter-np

Weekly Downloads

1

Version

0.1.2

License

ISC

Unpacked Size

14.2 kB

Total Files

9

Last publish

Collaborators

  • dineshdb