node package manager


pure-JS library to handle codepages

Codepages for JS

Codepages are character encodings. In many contexts, single- or double-byte character sets are used in lieu of Unicode encodings. The codepages map between characters and numbers. hosts lists of mappings. The build script automatically downloads and parses the mappings in order to generate the full script. The pages.csv description in controls which codepages are used.

In node:

var cptable = require('codepage');

In the browser:

<script src="cptable.js"></script>
<script src="cputils.js"></script>

Alternatively, use the full version in the dist folder:

<script src="cptable.full.js"></script>

The complete set of codepages is large due to some Double Byte Character Set encodings. A much smaller file that just includes SBCS codepages is provided in this repo (sbcs.js), as well as a file for other projects (cpexcel.js)

If you know which codepages you need, you can include individual scripts for each codepage. The individual files are provided in the bits/ directory. For example, to include only the Mac codepages:

<script src="bits/10000.js"></script>
<script src="bits/10006.js"></script>
<script src="bits/10007.js"></script>
<script src="bits/10029.js"></script>
<script src="bits/10079.js"></script>
<script src="bits/10081.js"></script>

All of the browser scripts define and append to the cptable object. To rename the object, edit the JSVAR shell variable in and run the script.

The utilities functions are contained in cputils.js, which assumes that the appropriate codepage scripts were loaded.

The codepages are indexed by number. To get the unicode character for a given codepoint, use the dec property:

var unicode_cp10000_255 = cptable[10000].dec[255]; // ˇ

To get the codepoint for a given character, use the enc property:

var cp10000_711 = cptable[10000].enc[String.fromCharCode(711)]; // 255

There are a few utilities that deal with strings and buffers:

var 汇总 = cptable.utils.decode(936, [0xbb,0xe3,0xd7,0xdc]);
var buf =  cptable.utils.encode(936,  汇总);
var sushi= cptable.utils.decode(65001, [0xf0,0x9f,0x8d,0xa3]); // 🍣
var sbuf = cptable.utils.encode(65001, sushi);

cptable.utils.encode(CP, data, ofmt) accepts a String or Array of characters and returns a representation controlled by ofmt:

  • Default output is a Buffer (or Array) of bytes (integers between 0 and 255).
  • If ofmt == 'str', return a String where o.charCodeAt(i) is the ith byte
  • If ofmt == 'arr', return an Array of bytes

A much smaller script, including only the codepages known to be used in Excel, is available under the name cpexcel. It exposes the same variable cptable and is suitable as a drop-in replacement when the full codepage tables are not needed.

In node:

var cptable = require('codepage/dist/cpexcel.full');

The script in the repo can take a manifest and generate JS source.


bash path_to_manifest output_file_name JSVAR


  • JSVAR is the name of the exported variable (generally cptable)
  • output_file_name is the output file (e.g. cpexcel.js, cptable.js)
  • path_to_manifest is the path to the manifest file.

The manifest file is expected to be a CSV with 3 columns:

<codepage number>,<source>,<size>

If a source is specified, it will try to download the specified file and parse. The file format is expected to follow the format from the site. The size should be 1 for a single-byte codepage and 2 for a double-byte codepage. For mixed codepages (which use some single- and some double-byte codes), the script assumes the mapping is a prefix code and generates efficient JS code.

Generated scripts only include the mapping. cat a mapping with cputils.js to produce a complete script like cpexcel.full.js.

This script uses voc. The script to build the codepage tables and the JS source is, so building is as simple as voc

The complete list of hardcoded codepages can be found in the file pages.csv.

Some codepages are easier to implement algorithmically. Since these are hardcoded in utils, there is no corresponding entry (they are "magic")

CP# Information Description
437 OEM United States
500 IBM EBCDIC International
620 NLS Mazovia (Polish) MS-DOS
708 MakeEncoding.cs Arabic (ASMO 708)
720 MakeEncoding.cs Arabic (Transparent ASMO); Arabic (DOS)
737 OEM Greek (formerly 437G); Greek (DOS)
775 OEM Baltic; Baltic (DOS)
850 OEM Multilingual Latin 1; Western European (DOS)
852 OEM Latin 2; Central European (DOS)
855 OEM Cyrillic (primarily Russian)
857 OEM Turkish; Turkish (DOS)
858 MakeEncoding.cs OEM Multilingual Latin 1 + Euro symbol
860 OEM Portuguese; Portuguese (DOS)
861 OEM Icelandic; Icelandic (DOS)
862 OEM Hebrew; Hebrew (DOS)
863 OEM French Canadian; French Canadian (DOS)
864 OEM Arabic; Arabic (864)
865 OEM Nordic; Nordic (DOS)
866 OEM Russian; Cyrillic (DOS)
869 OEM Modern Greek; Greek, Modern (DOS)
870 MakeEncoding.cs IBM EBCDIC Multilingual/ROECE (Latin 2)
874 Windows Thai
875 IBM EBCDIC Greek Modern
895 NLS Kamenický (Czech) MS-DOS
932 Japanese Shift-JIS
936 Simplified Chinese GBK
949 Korean
950 Traditional Chinese Big5
1026 IBM EBCDIC Turkish (Latin 5)
1047 MakeEncoding.cs IBM EBCDIC Latin 1/Open System
1140 MakeEncoding.cs IBM EBCDIC US-Canada (037 + Euro symbol)
1141 MakeEncoding.cs IBM EBCDIC Germany (20273 + Euro symbol)
1142 MakeEncoding.cs IBM EBCDIC Denmark-Norway (20277 + Euro symbol)
1143 MakeEncoding.cs IBM EBCDIC Finland-Sweden (20278 + Euro symbol)
1144 MakeEncoding.cs IBM EBCDIC Italy (20280 + Euro symbol)
1145 MakeEncoding.cs IBM EBCDIC Latin America-Spain (20284 + Euro symbol)
1146 MakeEncoding.cs IBM EBCDIC United Kingdom (20285 + Euro symbol)
1147 MakeEncoding.cs IBM EBCDIC France (20297 + Euro symbol)
1148 MakeEncoding.cs IBM EBCDIC International (500 + Euro symbol)
1149 MakeEncoding.cs IBM EBCDIC Icelandic (20871 + Euro symbol)
1200 magic Unicode UTF-16, little endian (BMP of ISO 10646)
1201 magic Unicode UTF-16, big endian
1250 Windows Central Europe
1251 Windows Cyrillic
1252 Windows Latin I
1253 Windows Greek
1254 Windows Turkish
1255 Windows Hebrew
1256 Windows Arabic
1257 Windows Baltic
1258 Windows Vietnam
1361 MakeEncoding.cs Korean (Johab)
10000 MAC Roman
10001 MakeEncoding.cs Japanese (Mac)
10002 MakeEncoding.cs MAC Traditional Chinese (Big5)
10003 MakeEncoding.cs Korean (Mac)
10004 MakeEncoding.cs Arabic (Mac)
10005 MakeEncoding.cs Hebrew (Mac)
10006 Greek (Mac)
10007 Cyrillic (Mac)
10008 MakeEncoding.cs MAC Simplified Chinese (GB 2312)
10010 MakeEncoding.cs Romanian (Mac)
10017 MakeEncoding.cs Ukrainian (Mac)
10021 MakeEncoding.cs Thai (Mac)
10029 MAC Latin 2 (Central European)
10079 Icelandic (Mac)
10081 Turkish (Mac)
10082 MakeEncoding.cs Croatian (Mac)
12000 magic Unicode UTF-32, little endian byte order
12001 magic Unicode UTF-32, big endian byte order
20000 MakeEncoding.cs CNS Taiwan (Chinese Traditional)
20001 MakeEncoding.cs TCA Taiwan
20002 MakeEncoding.cs Eten Taiwan (Chinese Traditional)
20003 MakeEncoding.cs IBM5550 Taiwan
20004 MakeEncoding.cs TeleText Taiwan
20005 MakeEncoding.cs Wang Taiwan
20105 MakeEncoding.cs Western European IA5 (IRV International Alphabet 5) 7-bit
20106 MakeEncoding.cs IA5 German (7-bit)
20107 MakeEncoding.cs IA5 Swedish (7-bit)
20108 MakeEncoding.cs IA5 Norwegian (7-bit)
20127 magic US-ASCII (7-bit)
20261 MakeEncoding.cs T.61
20269 MakeEncoding.cs ISO 6937 Non-Spacing Accent
20273 MakeEncoding.cs IBM EBCDIC Germany
20277 MakeEncoding.cs IBM EBCDIC Denmark-Norway
20278 MakeEncoding.cs IBM EBCDIC Finland-Sweden
20280 MakeEncoding.cs IBM EBCDIC Italy
20284 MakeEncoding.cs IBM EBCDIC Latin America-Spain
20285 MakeEncoding.cs IBM EBCDIC United Kingdom
20290 MakeEncoding.cs IBM EBCDIC Japanese Katakana Extended
20297 MakeEncoding.cs IBM EBCDIC France
20420 MakeEncoding.cs IBM EBCDIC Arabic
20423 MakeEncoding.cs IBM EBCDIC Greek
20424 MakeEncoding.cs IBM EBCDIC Hebrew
20833 MakeEncoding.cs IBM EBCDIC Korean Extended
20838 MakeEncoding.cs IBM EBCDIC Thai
20866 MakeEncoding.cs Russian Cyrillic (KOI8-R)
20871 MakeEncoding.cs IBM EBCDIC Icelandic
20880 MakeEncoding.cs IBM EBCDIC Cyrillic Russian
20905 MakeEncoding.cs IBM EBCDIC Turkish
20924 MakeEncoding.cs IBM EBCDIC Latin 1/Open System (1047 + Euro symbol)
20932 MakeEncoding.cs Japanese (JIS 0208-1990 and 0212-1990)
20936 MakeEncoding.cs Simplified Chinese (GB2312-80)
20949 MakeEncoding.cs Korean Wansung
21025 MakeEncoding.cs IBM EBCDIC Cyrillic Serbian-Bulgarian
21027 NLS Extended/Ext Alpha Lowercase
21866 MakeEncoding.cs Ukrainian Cyrillic (KOI8-U)
28591 ISO 8859-1 Latin 1 (Western European)
28592 ISO 8859-2 Latin 2 (Central European)
28593 ISO 8859-3 Latin 3
28594 ISO 8859-4 Baltic
28595 ISO 8859-5 Cyrillic
28596 ISO 8859-6 Arabic
28597 ISO 8859-7 Greek
28598 ISO 8859-8 Hebrew (ISO-Visual)
28599 ISO 8859-9 Turkish
28600 ISO 8859-10 Latin 6
28601 ISO 8859-11 Latin (Thai)
28603 ISO 8859-13 Latin 7 (Estonian)
28604 ISO 8859-14 Latin 8 (Celtic)
28605 ISO 8859-15 Latin 9
28606 ISO 8859-15 Latin 10
29001 MakeEncoding.cs Europa 3
38598 MakeEncoding.cs ISO 8859-8 Hebrew (ISO-Logical)
50220 MakeEncoding.cs ISO 2022 JIS Japanese with no halfwidth Katakana
50221 MakeEncoding.cs ISO 2022 JIS Japanese with halfwidth Katakana
50222 MakeEncoding.cs ISO 2022 Japanese JIS X 0201-1989 (1 byte Kana-SO/SI)
50225 MakeEncoding.cs ISO 2022 Korean
50227 MakeEncoding.cs ISO 2022 Simplified Chinese
51932 MakeEncoding.cs EUC Japanese
51936 MakeEncoding.cs EUC Simplified Chinese
51949 MakeEncoding.cs EUC Korean
52936 MakeEncoding.cs HZ-GB2312 Simplified Chinese
54936 MakeEncoding.cs GB18030 Simplified Chinese (4 byte)
57002 MakeEncoding.cs ISCII Devanagari
57003 MakeEncoding.cs ISCII Bengali
57004 MakeEncoding.cs ISCII Tamil
57005 MakeEncoding.cs ISCII Telugu
57006 MakeEncoding.cs ISCII Assamese
57007 MakeEncoding.cs ISCII Oriya
57008 MakeEncoding.cs ISCII Kannada
57009 MakeEncoding.cs ISCII Malayalam
57010 MakeEncoding.cs ISCII Gujarati
57011 MakeEncoding.cs ISCII Punjabi
65000 magic Unicode (UTF-7)
65001 magic Unicode (UTF-8)

Note that MakeEncoding.cs deviates from for some codepages. In the case of direct conflicts, takes precedence. In cases where the listing does not prescribe a value, MakeEncoding.cs value is used.

NLS refers to the National Language Support files supplied in various versions of Windows. In older versions of Windows (e.g. Windows 98) these files followed the pattern CP_#.NLS, but newer versions use the pattern C_#.NLS.