lzutf8-light
TypeScript icon, indicating that this package has built-in type declarations

0.6.5 • Public • Published

LZ-UTF8-LIGHT

LZ-UTF8-LIGHT, lightweight version forded from LZ-UTF8 is a string compression library and format. Is an extension to the UTF-8 character encoding, augmenting the UTF-8 bytestream with optional compression based the LZ77 algorithm. Some of its properties:

  • Compresses strings only. Doesn't support arbitrary byte sequences.
  • Strongly optimized for speed, both in the choice of algorithm and its implementation. Approximate measurements using a low-end desktops and 1MB strings: 3-14MB/s compression , 20-120MB/s decompression (detailed benchmarks and comparison to other Javascript libraries can be found in the technical paper). Due to the concentration on time efficiency, the resulting compression ratio can be significantly lower when compared to more size efficient algorithms like LZW + entropy coding.
  • Byte-level superset of UTF-8. Any valid UTF-8 bytestream is also a valid LZ-UTF8 stream (but not vice versa). This special property allows both compressed and plain UTF-8 streams to be freely concatenated and decompressed as single unit (or with any arbitrary partitioning). Some possible applications:
    • Sending static pre-compressed data followed by dynamically generated uncompressed data from a server (and possibly appending a compressed static "footer", or repeating the process several times).
    • Appending both uncompressed/compressed data to a compressed log file/journal without needing to rewrite it.
    • Joining multiple source files, where some are possibly pre-compressed, and serving them as a single concatenated file without additional processing.
  • Patent free (all relevant patents have long expired).

Javascript implementation:

  • Tested on most popular browsers and platforms: Node.js 4+, Chrome, Firefox, Opera, Edge, IE10+ (IE8 and IE9 may work with a typed array polyfill), Android 4+, Safari 5+.
  • Allows compressed data to be efficiently packed in plain Javascript UTF-16 strings (see the "BinaryString" encoding described later in this document) when binary storage is not available or desired (e.g. when using LocalStorage or older IndexedDB).
  • Supports Node.js streams.
  • Written in TypeScript.

Quick start

Table of Contents

API Reference

Getting started

Node.js:

npm install lzutf8-light
var LZUTF8_LIGHT = require('lzutf8-light');

Type Identifier Strings

"ByteArray" - An array of bytes. As of 0.3.2, always a Uint8Array. In versions up to 0.2.3 the type was determined by the platform (Array for browsers that don't support typed arrays, Uint8Array for supporting browsers and Buffer for Node.js).

IE8/9 and support was dropped at 0.3.0 though these browsers can still be used with a typed array polyfill.

"Buffer" - A Node.js Buffer object.

"StorageBinaryString" - A string containing compacted binary data encoded to fit in valid UTF-16 strings. Please note the older, deprecated, "BinaryString" encoding, is still internally supported in the library but has been removed from this document. More details are included further in this document.

"Base64" - A base 64 string.

Core Methods

LZUTF8_LIGHT.compress(..)

var output = LZUTF8_LIGHT.compress(input, [options]);

Compresses the given input data.

input can be either a String or UTF-8 bytes stored in a Uint8Array or Buffer

options (optional): an object that may have any of the properties:

  • outputEncoding: "ByteArray" (default), "Buffer", "StorageBinaryString" or "Base64"

returns: compressed data encoded by encoding, or ByteArray if not specified.

LZUTF8_LIGHT.decompress(..)

var output = LZUTF8_LIGHT.decompress(input, [options]);

Decompresses the given compressed data.

input: can be either a Uint8Array, Buffer or String (where encoding scheme is then specified in inputEncoding)

options (optional): an object that may have the properties:

  • inputEncoding: "ByteArray" (default), "StorageBinaryString" or "Base64"
  • outputEncoding: "String" (default), "ByteArray" or "Buffer" to return UTF-8 bytes

returns: decompressed bytes encoded as encoding, or as String if not specified.

Lower-level Methods

LZUTF8_LIGHT.Compressor

var compressor = new LZUTF8_LIGHT.Compressor();

Creates a compressor object. Can be used to incrementally compress a multi-part stream of data.

returns: a new LZUTF8_LIGHT.Compressor object

LZUTF8_LIGHT.Compressor.compressBlock(..)

var compressor = new LZUTF8_LIGHT.Compressor();
var compressedBlock = compressor.compressBlock(input);

Compresses the given input UTF-8 block.

input can be either a String, or UTF-8 bytes stored in a Uint8Array or Buffer

returns: compressed bytes as ByteArray

This can be used to incrementally create a single compressed stream. For example:

var compressor = new LZUTF8_LIGHT.Compressor();
var compressedBlock1 = compressor.compressBlock(block1);
var compressedBlock2 = compressor.compressBlock(block2);
var compressedBlock3 = compressor.compressBlock(block3);
..

LZUTF8_LIGHT.Decompressor

var decompressor = new LZUTF8_LIGHT.Decompressor();

Creates a decompressor object. Can be used to incrementally decompress a multi-part stream of data.

returns: a new LZUTF8_LIGHT.Decompressor object

LZUTF8_LIGHT.Decompressor.decompressBlock(..)

var decompressor = new LZUTF8_LIGHT.Decompressor();
var decompressedBlock = decompressor.decompressBlock(input);

Decompresses the given block of compressed bytes.

input can be either a Uint8Array or Buffer

returns: decompressed UTF-8 bytes as ByteArray

Remarks: will always return the longest valid UTF-8 stream of bytes possible from the given input block. Incomplete input or output byte sequences will be prepended to the next block.

Note: This can be used to incrementally decompress a single compressed stream. For example:

var decompressor = new LZUTF8_LIGHT.Decompressor();
var decompressedBlock1 = decompressor.decompressBlock(block1);
var decompressedBlock2 = decompressor.decompressBlock(block2);
var decompressedBlock3 = decompressor.decompressBlock(block3);
..

LZUTF8_LIGHT.Decompressor.decompressBlockToString(..)

var decompressor = new LZUTF8_LIGHT.Decompressor();
var decompressedBlockAsString = decompressor.decompressBlockToString(input);

Decompresses the given block of compressed bytes and converts the result to a String.

input can be either a Uint8Array or Buffer

returns: decompressed String

Remarks: will always return the longest valid string possible from the given input block. Incomplete input or output byte sequences will be prepended to the next block.

Node.js only methods

LZUTF8_LIGHT.createCompressionStream()

var compressionStream = LZUTF8_LIGHT.createCompressionStream();

Creates a compression stream. The stream will accept both Buffers and Strings in any encoding supported by Node.js (e.g. utf8, utf16, ucs2, base64, hex, binary etc.) and return Buffers.

example:

var sourceReadStream = fs.createReadStream(“content.txt”);
var destWriteStream = fs.createWriteStream(“content.txt.lzutf8”);
var compressionStream = LZUTF8_LIGHT.createCompressionStream();

sourceReadStrem.pipe(compressionStream).pipe(destWriteStream);

On error: emits an error event with the Error object as parameter.

LZUTF8_LIGHT.createDecompressionStream()

var decompressionStream = LZUTF8_LIGHT.createDecompressionStream();

Creates a decompression stream. The stream will accept and return Buffers.

On error: emits an error event with the Error object as parameter.

Character encoding methods

LZUTF8_LIGHT.encodeUTF8(..)

var output = LZUTF8_LIGHT.encodeUTF8(input);

Encodes a string to UTF-8.

input as String

returns: encoded bytes as ByteArray

LZUTF8_LIGHT.decodeUTF8(..)

var outputString = LZUTF8_LIGHT.decodeUTF8(input);

Decodes UTF-8 bytes to a String.

input as either a Uint8Array or Buffer

returns: decoded bytes as String

LZUTF8_LIGHT.encodeBase64(..)

var outputString = LZUTF8_LIGHT.encodeBase64(bytes);

Encodes bytes to a Base64 string.

input as either a Uint8Array or Buffer

returns: resulting Base64 string.

remarks: Maps every 3 consecutive input bytes to 4 output characters of the set A-Z,a-z,0-9,+,/ (a total of 64 characters). Increases stored byte size to 133.33% of original (when stored as ASCII or UTF-8) or 266% (stored as UTF-16).

LZUTF8_LIGHT.decodeBase64(..)

var output = LZUTF8_LIGHT.decodeBase64(input);

Decodes UTF-8 bytes to a String.

input as String

returns: decoded bytes as ByteArray

remarks: the decoder cannot decode concatenated base64 strings. Although it is possible to add this capability to the JS version, compatibility with other decoders (such as the Node.js decoder) prevents this feature to be added.

LZUTF8_LIGHT.encodeStorageBinaryString(..)

Note: the older BinaryString encoding has been deprecated due to a compatibility issue with the IE browser's LocalStorage/SessionStorage implementation. This newer version works around that issue by avoiding the 0 codepoint.

var outputString = LZUTF8_LIGHT.encodeStorageBinaryString(input);

Encodes binary bytes to a valid UTF-16 string.

input as either a Uint8Array or Buffer

returns: String

remarks: To comply with the UTF-16 standard, it only uses the bottom 15 bits of each character, effectively mapping every 15 input bits to a single 16 bit output character. This Increases the stored byte size to 106.66% of original.

LZUTF8_LIGHT.decodeStorageBinaryString(..)

Note: the older BinaryString encoding has been deprecated due to a compatibility issue with the IE browser's LocalStorage/SessionStorage implementation. This newer version works around that issue by avoiding the 0 codepoint.

var output = LZUTF8_LIGHT.decodeStorageBinaryString(input);

Decodes a binary string.

input as String

returns: decoded bytes as ByteArray

remarks: Multiple binary strings may be freely concatenated and decoded as a single string. This is made possible by ending every sequence with special marker (char code 32768 for an even-length sequence and 32769 for a an odd-length sequence).

Release history

  • 0.1.x: Initial release.
  • 0.2.x: Added async error handling. Added support for TextEncoder and TextDecoder when available.
  • 0.3.x: Removed support to IE8/9. Removed support for plain Array inputs. All "ByteArray" outputs are now Uint8Array objects. A separate "Buffer" encoding setting can be used to return Buffer objects.
  • 0.4.x: Major code restructuring. Removed support for versions of Node.js prior to 4.0.
  • 0.5.x: Added the "StorageBinaryString" encoding.

License

Copyright (c) 2014-2018, Rotem Dan <rotemdan@gmail.com>.

Source code and documentation are available under the MIT license.

Package Sidebar

Install

npm i lzutf8-light

Weekly Downloads

83

Version

0.6.5

License

MIT

Unpacked Size

123 kB

Total Files

6

Last publish

Collaborators

  • chunlaw