lexicodec
TypeScript icon, indicating that this package has built-in type declarations

0.0.4 • Public • Published

Lexicographical Codec

Lexicographcial encodings are very useful for indexing information in an ordered key-value store such as LevelDb, FoundationDb, or DynamoDb.

Why

Existing ordered key-value storage options will only accept bytes as keys and it's non-trivial to convert a tuple into a byte-string that maintains a consistent order.

For numbers, you can't just stringify them because 2 < 11 but "2" > "11". So this package uses elen for encoding signed float64 numbers into lexicogrpahically ordered strings.

For arrays / tuples, if you join the array components together then you won't maintain component-wise order because ["jon", "smith"] < ["jonathan", "smith"] but jonsmith > jonathansmith. So this package joins elements using a null byte \x00, escapes null bytes with \x00 => \x01\x00, and escapes the escape bytes with \x01 => \x01\x01. Thus, ["jon", "smith"] => "jon\x00smith" and ["jonathan", "smith"] => "jonathan\x00smith" which will maintain component-wise lexicographical order.

Lastly, we use a single byte prefix to encode the type of value we are encoding.

Getting Started

npm install lexicodec
export const jsonCodec = new Codec({
	// null < object < array < number < string < boolean
	b: NullEncoding,
	c: ObjectEncoding,
	d: ArrayEncoding,
	e: NumberEncoding,
	f: StringEncoding,
	g: BooleanEncoding,
})

jsonCodec.encode(null) // => "b"
jsonCodec.encode(true) // => "gtrue"
jsonCodec.encode("hello world") // => "fhello world"
jsonCodec.encode(10) // => "e>;;41026;;;2161125899906842624"
jsonCodec.encode(["chet", "corcos"]) // => "dfchet\u0000fcorcos\u0000"
jsonCodec.encode({date: "2020-03-10"}) // => "cfdate\u0000f2020-03-10\u0000"

Objects are encoded as entries with ordered keys and they aren't all that useful except for duck typing. However, instead of duck typing, you can create your own custom encodings as well.

const DateEncoding: Encoding<Date> = {
	match: (value: unknown) =>
		typeof value === "object" &&
		Object.getPrototypeOf(value) === Date.prototype,
	encode: (value) => value.toISOString(),
	decode: (value) => new Date(value),
	compare: (a, b) => (a > b ? 1 : b > a ? -1 : 0),
}

const codec = new Codec({
	b: NullEncoding,
	c: ObjectEncoding,
	d: ArrayEncoding,
	e: NumberEncoding,
	f: StringEncoding,
	g: BooleanEncoding,
	h: DateEncoding
})

codec.encode(new Date()) // => "h2023-11-29T18:44:54.942Z"
codec.encode(["created", new Date()]) // => "dfcreated\u0000h2023-11-29T18:44:54.943Z\u0000"

Encodings also have a compare property so that you can compare values without having to serializing them. That way you can create in-memory abstractions that mimic the serialized behavior, useful for caching, etc.

codec.compare(["jon", "smith"], ["jonathan", "smith"]) // => -1

Package Sidebar

Install

npm i lexicodec

Weekly Downloads

10

Version

0.0.4

License

MIT

Unpacked Size

23.8 kB

Total Files

11

Last publish

Collaborators

  • ccorcos