Morpheme Splitter - Nepali

Script adapted from this code in ipython notebook.

Nepali words are composed of various morphemes which can be broadly divided into two categories: Vowels and Consonants. A given word can be resolved into its morphemes by some elementary rules. While these rules are relatively straightforward, the unicode representation make it a little bit non-trivial to work with. Consider these scenarios:

क is actually a single character in Unicode, while it is two morphemes, क् + अ in Nepali.
क + ् in Unicode representation translates to क्, a single morpheme in Nepali.
क + ि in Unicode representation translates to क् + इ in Nepali.

In this script, we define rules for the separation of morphemes in Nepali Unicode representation. This shall serve as a building block as we later construct systems for separating syllables from multi-syllables words in Nepali.

Rules

If any character is a vowel, leave it as it is
If any character is a single unicode consonant क - ह
- If this is a last letter, the previous character as well as this character make two morpheme, where the latter is the independent vowel अ.
- If next character is a halanta u(्), the previous character is a single morpheme.
- If next character is a vowel, the previous character as well as this vowel make two morphemes (क् + ि).
- If next character is a consonant, the previous character as well as this character make two morpheme, where the latter is the independent vowel अ.

License

MIT License

Copyright

Dinesh Bhattarai dbhattarai252@gmail.com

morpheme-splitter-np

Morpheme Splitter - Nepali

Rules

License

Copyright

Readme

Keywords

Package Sidebar

Install

Weekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

morpheme-splitter-np

Morpheme Splitter - Nepali

Rules

License

Copyright

Readme

Keywords

Package Sidebar

Install

DownloadsWeekly Downloads

Version

License

Unpacked Size

Total Files

Last publish

Collaborators

Weekly Downloads