data-record
Record type for Node.js.
Record Format
A record is consist of an array of field definitions, describing a physical data
structure in memory that can be mapped to a struct
type in C or C++.
The following array defines a simple fixed size record with some nested records.
// A record definition
const def = [
{ name: 'value1', type: 'uint32_le' },
{ name: 'value2', type: 'int32_be' },
{ name: 'custom1', type: 'int_le', size: 3 },
{ name: 'custom2', type: 'int_le', size: 5 },
{
name: 'nested',
type: 'record',
def: [
{ name: 'a', type: 'uint32_le' },
{ name: 'b', type: 'uint32_le' },
],
},
{
name: 'x',
type: 'record',
def: [
{ name: 'a', type: 'uint32_le' },
{ name: 'y', type: 'record', def: [{ name: 'a', type: 'uint32_le' }] },
],
},
{ name: 'firstName', type: 'cstring', size: 15 },
]
const compiled = compile(def)
The compilation result contains the same information as the original definition but in an optimized data structure that can be accessed more efficiently than the human-readable record definition object.
Types
Type | Description |
---|---|
int8 |
8-bit signed integer |
int16 |
16-bit signed integer in host byte order |
int16_be |
16-bit signed integer in big-endian order |
int16_le |
16-bit signed integer in little-endian order |
int32 |
32-bit signed integer in host byte order |
int32_be |
32-bit signed integer in big-endian order |
int32_le |
32-bit signed integer in little-endian order |
int64 |
64-bit signed integer in host byte order |
int64_be |
64-bit signed integer in big-endian order |
int64_le |
64-bit signed integer in little-endian order |
uint8 |
8-bit unsigned integer |
uint16 |
16-bit unsigned integer in host byte order |
uint16_be |
16-bit unsigned integer in big-endian order |
uint16_le |
16-bit unsigned integer in little-endian order |
uint32 |
32-bit unsigned integer in host byte order |
uint32_be |
32-bit unsigned integer in big-endian order |
uint32_le |
32-bit unsigned integer in little-endian order |
uint64 |
64-bit unsigned integer in host byte order |
uint64_be |
64-bit unsigned integer in big-endian order |
uint64_le |
64-bit unsigned integer in little-endian order |
float |
32-bit single-precision floating-point in host byte order |
float_be |
32-bit single-precision floating-point in big-endian order |
float_le |
32-bit single-precision floating-point in little-endian order |
double |
64-bit double-precision floating-point in host byte order |
double_be |
64-bit double-precision floating-point in big-endian order |
double_le |
64-bit double-precision floating-point in little-endian order |
int_be |
0 to 48 bit variable size big-endian signed integer |
int_le |
0 to 48 bit variable size little-endian signed integer |
uint_be |
0 to 48 bit variable size big-endian unsigned integer |
uint_le |
0 to 48 bit variable size little-endian unsigned integer |
cstring |
null-terminated C-string (termination not enforced, same behavior as strcpy() ) |
record |
A nested record |
record_p |
A pointer to an array of records |
int8_p |
A pointer to an array of 8-bit signed integers |
int16_p |
A pointer to an array of 16-bit signed integers in host byte order |
int16_be_p |
A pointer to an array of 16-bit signed integers in big-endian order |
int16_le_p |
A pointer to an array of 16-bit signed integers in little-endian order |
int32_p |
A pointer to an array of 32-bit signed integers in host byte order |
int32_be_p |
A pointer to an array of 32-bit signed integers in big-endian order |
int32_le_p |
A pointer to an array of 32-bit signed integers in little-endian order |
int64_p |
A pointer to an array of 64-bit signed integers in host byte order |
int64_be_p |
A pointer to an array of 64-bit signed integers in big-endian order |
int64_le_p |
A pointer to an array of 64-bit signed integers in little-endian order |
uint8_p |
A pointer to an array of 8-bit unsigned integers |
uint16_p |
A pointer to an array of 16-bit unsigned integers in host byte order |
uint16_be_p |
A pointer to an array of 16-bit unsigned integers in big-endian order |
uint16_le_p |
A pointer to an array of 16-bit unsigned integers in little-endian order |
uint32_p |
A pointer to an array of 32-bit unsigned integers in host byte order |
uint32_be_p |
A pointer to an array of 32-bit unsigned integers in big-endian order |
uint32_le_p |
A pointer to an array of 32-bit unsigned integers in little-endian order |
uint64_p |
A pointer to an array of 64-bit unsigned integers in host byte order |
uint64_be_p |
A pointer to an array of 64-bit unsigned integers in big-endian order |
uint64_le_p |
A pointer to an array of 64-bit unsigned integers in little-endian order |
float_p |
A pointer to an array of 32-bit single-precision floating-points in host byte order |
float_be_p |
A pointer to an array of 32-bit single-precision floating-points in big-endian order |
float_le_p |
A pointer to an array of 32-bit single-precision floating-points in little-endian order |
double_p |
A pointer to an array of 64-bit double-precision floating-points in host byte order |
double_be_p |
A pointer to an array of 64-bit double-precision floating-points in big-endian order |
double_le_p |
A pointer to an array of 64-bit double-precision floating-points in little-endian order |
cstring_p |
A pointer to a C-string |
Arrays
Any type can be used to create an array but there is a caveat, all the items
inside an must have the same fixed size. The size can be implicit from the type
or a variable size given in the field definition (int_be
, int_le
, uint_be
,
cstring
, and record
).
The array notation is as follows:
// TYPE[SIZE]
{ name: 'intArr', type: 'int8[80]' }
Pointers
Pointer types can point to variable size data (at runtime) without need to
recompile the record definition. This is different from variable size field
types (int_be
, int_le
, uint_be
, uint_l
, cstring
) as the size of those
fields is locked in compilation (fixed size array) and have a fixed position in
the data structure. Pointer types are marked with a _p
suffix in the type name.
For example a cstring_p
pointer can point to the string "Hello"
during one
serialization call and to the string "world!!"
on the next call. The string is
copied into the dynamic heap section of the resulting buffer which is reserved
for storing variable sized payloads.
Data Structure
In the following examples the data structure is represented in 32bit big-endian format, but all common architectures are supported 32-bit BE/LE, 64-bit BE/LE, or even mixed endianness is possible.
The serialize()
function returns a Buffer
object that contains a
record structure and a heap sections. The heap is only populated if the record
contains pointers to the data in heap.
The following example shows a record definition, what is stored in the buffer, and a matching C struct.
[
{ "name": "sport", "type": "uint16_be" },
{ "name": "dport", "type": "uint16_be" },
{ "name": "seqno", "type": "uint32_be" },
{ "name": "ackno", "type": "uint32_be" },
{ "name": "options", "type": "uint_be", "size": 3 },
{ "name": "data", "type": "cstring_p" }
]
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ RECORD
| .sport | .dport |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| .seqno |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| .ackno |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| .options | PADDING |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| .data_offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| .data_size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ HEAP
| DATA |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Here data
is inside the heap area.
struct frame {
uint16_t sport;
uint16_t dport;
uint32_t seqno;
uint32_t ackno;
struct {
unsigned int options : 24;
};
char * data;
size_t data_len;
};
API
Functions
compile(recordDef[, { align: true }])
allocRecord(compiledDef[, { unpool, heapSize }])
calcHeapSize(compiledDef, obj)
createRecord(compiledDef, obj)
generateRecordDef(obj)
generateCHeader(compiledDef)
serialize(compiledDef, buf, obj)
deserialize(compiledDef, buf)
readValue(compiledDef, buf, path)
readString(compiledDef, buf, path[, encoding])
writeValue(compiledDef, buf, path, value)
writeString(compiledDef, buf, path, value[, encoding])
createReader(compiledDef, buf, path)
createStringReader(compiledDef, buf, path[, encoding])
createWriter(compiledDef, buf, path)
generateRecordDef()
makes a best effort guess on how an object could be
serialized. Strings will be serialized to the size they were seen in the
example object, and numbers will be stored using the same endianness as the
host architecture is currently using.
Record alignment
By default compile()
aligns the resulting data for optimal access in C.
If align
is set true for compile()
then the resulting buffers will be
aligned to the expected C struct alignment on the underlying architecture.
if align
is false, then the resulting data is packed as compact as
possible. generateCHeader()
does not support unaligned mode.
However, currently subrecords/nested records are not aligned as C structures
even if align
is set. Therefore, if nested records and especially record
arrays will be accessed in C care should be taking to ensure that all the
records are aligned to word size. This was a common manual task in
pre-ANSI C world.
Typically in C this manual alignment would look something like (assuming 32bit little-endian):
struct x {
struct {
int16_t value;
int16_t _spare;
} a;
uint32_t flags;
};
With the definition language here we can do the following:
[
{
"name": "x",
"type": "record",
"def": [
{ "name": "value", "type": "int16_le" },
{ "name": "_spare", "type": "int16_le" }
]
},
{ "name": "flags", "type": "uint32_le" }
]
This is the exact bitwise equivalent of the previous C struct.
Scripts
-
yarn build
- run TS build -
yarn lint
- run ESlint -
yarn prettier
- run Prettier -
yarn test
- run tests -
yarn perf
- run a perf test
Examples
const recordDefEx = [
{ name: 'a', type: 'uint32_le' },
{ name: 'b', type: 'int32_le' },
{ name: 'c', type: 'int_le', size: 3 },
{ name: 'd', type: 'int_le', size: 5 },
{
name: 'nested',
type: 'record',
def: [
{ name: 'a', type: 'uint32_le' },
{ name: 'b', type: 'uint32_le' },
],
},
{
name: 'x',
type: 'record',
def: [
{ name: 'a', type: 'uint32_le' },
{ name: 'y', type: 'record', def: [{ name: 'a', type: 'uint32_le' }] },
],
},
]
const obj = {
a: 4,
b: -128,
c: 10,
d: 5,
nested: {
a: 5,
b: 5,
},
x: {
a: 5,
y: {
a: 5,
},
},
}
const compiled = compile(recordDefEx)
const buf = createRecord(compiled, obj)
const objSerialized = v8.serialize(obj)
const jsonStr = JSON.stringify(obj)
console.log(
`buf.length = ${buf.length}, objSerialized.length = ${objSerialized.length}, jsonStr.length = ${jsonStr.length}`
)
// buf.length = 32, objSerialized.length = 69, JSON.length = 76
Performance Testing
$ yarn perf
$ node --prof ./node_modules/.bin/ts-node __perf__/perf.ts
modify
======
nativeObjectTest 1.21 ms
nativeV8SerializerTest 18544.97 ms
jsonTest 171.72 ms
dataRecordTestSlow 55.11 ms
dataRecordTestFast 5.60 ms
serialization
=============
./data/simple.json
nativeV8SerializerTest 10953.32 ms
jsonTest 489.59 ms
dataRecordSerializeTest 370.34 ms
./data/nesting.json
nativeV8SerializerTest 5362.70 ms
jsonTest 2724.44 ms
dataRecordSerializeTest 2800.08 ms
./data/mega-flat.json
nativeV8SerializerTest 10268.37 ms
jsonTest 20080.70 ms
dataRecordSerializeTest 6110.32 ms
./data/numbers.json
nativeV8SerializerTest 5280.86 ms
jsonTest 7111.44 ms
dataRecordSerializeTest 769.95 ms
The performance tests are located under the __perf__
directory and can be executed with yarn perf
.
The test modules can be run individually by giving one or more module names as an argument to the
comman, e.g. yarn perf serialization
.
Each run will create a isolate file that can be parsed as follows:
node --prof-process isolate-0x5ecbef0-130826-v8.log > processed.txt