my-diacritic-sort

1.0.1 • Public • Published

my-diacritic-sort

I received some PDFs using Myanmar Unicode characters, but also empty codepoints representing different Myanmar diacritics, and other combination characters. Converting to full Unicode order is painstaking, and we need it in several applications, so I am putting it into a module.

Sample text

We received a PDF where the name "Mohnyin Township" appears like this:

မိုးညှင်းမြို့နယ်အတွင်းရှိ

but when you copy and paste the actual characters, you get this:

မိ􏰀းည􏰋င်း􏰅မိုန့ယ်အတွင်းရှိ

Here are its issues:

The first character မိုး is missing the ု because an empty codepoint is used. This separates out the next diacritic း

မြို့န is written 􏰅မိုန့ - the ြ diacritic is an empty codepoint that is placed before the character that it modifies. The ့ diacritic is placed after the character န instead of the character that it modifies.

In other text samples, there are multiple diacritics in a nonstandard order.

On the web

Include the my-diacritic.js file. Then pass it some text:

sortDiacritics("မိ􏰀းည􏰋င်း􏰅မိုန့ယ်အတွင်းရှိ");
> "မိုးညှင်းမြို့နယ်အတွင်းရှိ"

It doesn't convert back. wontfix.

In NodeJS

npm install my-diacritic-sort
var sortDiacritics = require("my-diacritic-sort");
sortDiacritics("မိ􏰀းည􏰋င်း􏰅မိုန့ယ်အတွင်းရှိ");

License

Open source under MIT license

Package Sidebar

Install

npm i my-diacritic-sort

Weekly Downloads

0

Version

1.0.1

License

MIT

Last publish

Collaborators

  • ndoiron