cjk-length

1.0.0 • Public • Published

CJK Length

Returns string length with wide characters counting as two

In CJK (Chinese, Japanese and Korean) text, "wide" or "fullwidth" characters are Unicode glyphs that get printed as two blocks wide instead of one when using a fixed-width font. Examples include ranges like the Japanese kana (あいうえお), full-width romaji (ABCDE), and kanji/hanzi ideographs (一所懸命).

Since these characters are printed as two blocks, but count as one, this causes a problem when trying to accurately measure the length of the string for use in a fixed-width text environment such as the terminal—a string containing one fullwidth character will visually appear to be one character longer than its length value would indicate. This causes e.g. tabulated layouts to be broken.

This function scans a given string for occurrences of characters from the relevant Unicode ranges to correctly determine the string's visual length.

For a full list of the character ranges used, see the characters.js source.

Usage

To use, replace property accesses such as myString.length with function calls to cjkLength(myString):

const cjkLength = require('cjk-length').default

// Using cjkLength() to get a visually correct string length for fixed-width fonts:
// In this case, 'abcdeABCDE' has length 10 but is displayed as though it's length 15.
const myString = 'abcdeABCDE'
console.log(myString.length)      // 10
console.log(cjkLength(myString))  // 15

// Verifying that this longer string width value looks correct (in a terminal):
console.log(`.${myString}.`)                         // .abcdeABCDE.
console.log(`.${'a'.repeat(myString.length)}.`)      // .aaaaaaaaaa.
console.log(`.${'a'.repeat(cjkLength(myString))}.`)  // .aaaaaaaaaaaaaaa.

If you need to process a string's wide characters in some other way, you can import the regular expression used to match them:

const { charsRegex } = require('cjk-length')

console.log(charsRegex instanceof RegExp)  // true

Note: charsRegex is a structured like new RegExp('[\u1100-\u11F9\u3000-\u303F .. etc. \uFFE0-\uFFE6]', 'g').

Sources

License

MIT license

Readme

Keywords

none

Package Sidebar

Install

npm i cjk-length

Weekly Downloads

735

Version

1.0.0

License

MIT

Unpacked Size

7.18 kB

Total Files

4

Last publish

Collaborators

  • msikma