Nonchalantly Performs Magic

    my-diacritic-sort

    1.0.1 • Public • Published

    my-diacritic-sort

    I received some PDFs using Myanmar Unicode characters, but also empty codepoints representing different Myanmar diacritics, and other combination characters. Converting to full Unicode order is painstaking, and we need it in several applications, so I am putting it into a module.

    Sample text

    We received a PDF where the name "Mohnyin Township" appears like this:

    မိုးညှင်းမြို့နယ်အတွင်းရှိ

    but when you copy and paste the actual characters, you get this:

    မိ􏰀းည􏰋င်း􏰅မိုန့ယ်အတွင်းရှိ

    Here are its issues:

    The first character မိုး is missing the ု because an empty codepoint is used. This separates out the next diacritic း

    မြို့န is written 􏰅မိုန့ - the ြ diacritic is an empty codepoint that is placed before the character that it modifies. The ့ diacritic is placed after the character န instead of the character that it modifies.

    In other text samples, there are multiple diacritics in a nonstandard order.

    On the web

    Include the my-diacritic.js file. Then pass it some text:

    sortDiacritics("မိ􏰀းည􏰋င်း􏰅မိုန့ယ်အတွင်းရှိ");
    > "မိုးညှင်းမြို့နယ်အတွင်းရှိ"

    It doesn't convert back. wontfix.

    In NodeJS

    npm install my-diacritic-sort
    var sortDiacritics = require("my-diacritic-sort");
    sortDiacritics("မိ􏰀းည􏰋င်း􏰅မိုန့ယ်အတွင်းရှိ");

    License

    Open source under MIT license

    Install

    npm i my-diacritic-sort

    DownloadsWeekly Downloads

    1

    Version

    1.0.1

    License

    MIT

    Last publish

    Collaborators

    • avatar