leb

1.0.0 • Public • Published

leb: LEB128 utilities for Node

This Node module provides several utility functions for dealing with the LEB128 family of integer representation formats.

LEB128, which is short for "Little-Endian Base 128", is somewhat like UTF-8 in representing numbers using a variable number of bytes. Unlike UTF-8, LEB128 uses just the high bit of each byte to determine the role of a byte. This makes it a somewhat more compact representation but with some cost in terms of the complexity on the read side.

LEB128 was first defined as part of the DWARF 3 specification, and it is also used in Android's DEX file format.

This module provides encoders and decoders for both signed and unsigned values, and with the decoded form being any of 32-bit integers, 64-bit integers, and arbitrary-length buffer (taken to be a bigint-style representation in little-endian order).

The 64-bit integer variants require a special note: Because JavaScript can't represent all possible 64-bit integers in its native number type, the 64-bit decoder methods return a lossy flag which indicates if the decoded result isn't exactly the number represented in the encoded form.

Format Details

The LEB128 format is really quite simple.

An encoded value is a series of bytes where the high bit (bit #7 or 0x80) is set on each byte but the final one. The other seven bits of each byte are the payload bits.

To interpret an encoded value, one concatenates the payload bits in little-endian order (so the first payload byte contains the least significant bits). After that, if the encoded value is a signed representation, one sign-extends the result.

Schematically, here are the one-byte encodings:

               +--------+
encoded        |0GFEDCBA|
               +--------+

unsigned       +--------+
interpretation |0GFEDCBA|
               +--------+

signed         +--------+
interpretation |GGFEDCBA|
               +--------+

That is: The unsigned interpretation of a single-byte encoding is the byte value itself. The signed interpretation is of the value as a signed seven-bit integer.

Similarly, here are the two-byte encodings:

               +--------+ +--------+
encoded        |1GFEDCBA| |0NMLKJIH|
               +--------+ +--------+

unsigned       +----------------+
interpretation |00NMLKJIHGFEDCBA|
               +----------------+

signed         +----------------+
interpretation |NNNMLKJIHGFEDCBA|
               +----------------+

That is: The unsigned interpretation of a two-byte encoding is a 14-bit integer consisting of the first-byte payload bits and second-byte payload bits concatenated togther. The signed interpretation is the same as the unsigned, except that bit #13 is treated as the sign and is hence extended to fill the remaining bits.

Some concrete examples (all numbers are hex):

encoded      unsigned          signed
bytes        interpretation    interpretation
-------      --------------    --------------
10           +10               +10
45           +45               -3b
8e 32        +190e             +190e
c1 57        +2bc1             -143f
80 80 80 3f  +7e00000          +7e00000
80 80 80 4f  +9e00000          -6200000

Building and Installing

npm install leb

Or grab the source. As of this writing, this module has no dependencies, so once you have the source, there's nothing more to do to "build" it.

Testing

npm test

Or

node ./test/test.js

API Details

decodeInt32(buffer, [index]) -> { value: num, nextIndex: num }

Takes a signed LEB128-encoded byte sequence in the given buffer at the given index (defaults to 0), returning the decoded value and the index just past the end of the encoded form. The value is expected to be a 32-bit integer.

This throws an exception if the buffer doesn't have a valid encoding at the index (only possibly true if the last byte in the buffer has its high bit set) or if the decoded value is out of the range of the expected type.

decodeInt64(buffer, [index]) -> { value: num, nextIndex: num, lossy: bool }

Takes a signed LEB128-encoded byte sequence in the given buffer at the given index (defaults to 0), returning the decoded value, the index just past the end of the encoded form, and a boolean indicating whether the decoded value experienced numeric conversion loss. The value is expected to be a 64-bit integer.

This throws an exception if the buffer doesn't have a valid encoding at the index (only possibly true if the last byte in the buffer has its high bit set) or if the decoded value is out of the range of the expected type.

decodeIntBuffer(encodedBuffer, [index]) -> { value: buffer, nextIndex: num }

Takes a signed LEB128-encoded byte sequence in the given buffer at the given index (defaults to 0), returning the decoded value and the index just past the end of the encoded form. The decoded value is a bigint-style buffer representing a signed integer, in little-endian order.

This throws an exception if the buffer doesn't have a valid encoding at the index (only possibly true if the last byte in the buffer has its high bit set).

decodeUint32(buffer, [index]) -> { value: num, nextIndex: num }

Like decodeInt32, but with the unsigned LEB128 format and unsigned 32-bit integer type.

decodeUint64(buffer, [index]) -> { value: num, nextIndex: num, lossy: bool }

Like decodeInt64, but with the unsigned LEB128 format and unsigned 64-bit integer type.

decodeUintBuffer(encodedBuffer, [index]) -> { value: buffer, nextIndex: num }

Like decodeIntBuffer, but with the unsigned LEB128 format.

encodeInt32(num) -> buffer

Takes a 32-bit signed integer, returning the signed LEB128 representation of it.

encodeInt64(num) -> buffer

Takes a 64-bit signed integer, returning the signed LEB128 representation of it.

encodeIntBuffer(buffer) -> encodedBuf

Takes a bigint-style buffer representing a signed integer, returning the signed LEB128 representation of it.

encodeUint32(num) -> buffer

Like encodeInt32, but with the unsigned 32-bit integer type and returning unsigned LEB128.

encodeUint64(num) -> buffer

Like encodeInt64, but with the unsigned 64-bit integer type and returning unsigned LEB128.

encodeUintBuffer(buffer) -> encodedBuf

Like encodeInt32, but with the buffer argument in unsigned bigint form and returning unsigned LEB128.

Contributing

Questions, comments, bug reports, and pull requests are all welcome.

Bug reports that include steps-to-reproduce (including code) are the best. Even better, make them in the form of pull requests that update the test suite. Thanks!


Copyright 2012-2024 the Leb Authors (Dan Bornstein et alia).
SPDX-License-Identifier: Apache-2.0

Package Sidebar

Install

npm i leb

Weekly Downloads

22,093

Version

1.0.0

License

Apache-2.0

Unpacked Size

50.7 kB

Total Files

9

Last publish

Collaborators

  • danfuzz