Markov Pack
A Node package for generating and using compacted Markov chains. Here's an example of what I use it for:
A Lax Equivalence Theorem for Multiple Perspective Views via Iterative Depth Estimation Algorithm with Admission Control Algorithm Combining Static and Interactive Distance Education Technical Specifications into Relational Database Design Using the Level Set Framework with Pairwise Constraints for Interval and Fuzzy ART Network for Computing Minimal-Norm Solutions to the Maximally Flat FIR Fractional Delay Filter and Linearized LNA Applied in Hydrology Domain.
Specification
The binary chain is an array of bytes (Uint8Array
in JavaScript). Numbers are
stored bigendian.
The chain has the following structure:
-
header
- Describes lengths and positions. -
pairOfLengths
- The lengths of the unique words. -
wordList
- A list of all the unique words. -
hashTable
- A table ofwordTuple
tochain
offset. -
chain
- List of possible continuation words.
header
Some number fields are included for reference purposes. They don't occupy any space in the binary but their method of calculation is specified. All the included lengths and sizes are on 4 bytes. All sizes refer to sizes in bits, not bytes.
-
magicNumber
(7 bytes) - 0x13e3ff45be9c06 (computed by runningecho 'markov-pack' | sha1sum | cut -c -14
) -
version
(1 byte) - For this first version it's 0x01. -
wordLengthsLen
- The number of unique word lengths. -
chainLen
- The number of elements in the chain. -
wordListLen
(not included) - The number of bytes in thewordList
block. Computed as the sum ofwordLength
*wordCount
for all elements inpairOfLengths
. -
hashTableLen
- The number of elements in the hash table. This is >= tochainLen
. -
chainBytesLen
- The number of bytes in thechain
block. -
wordSize
(not included) - This can be computed by counting the total number of words inpairOfLengths
and getting the number of bits required to store that number. -
wordTupleSize
(not included) - Double the size ofwordSize
. -
offsetSize
(not included) - The number of bits required to storechainBytesLen
* 8. -
contListSize
- The sizepairCount
. -
weightSize
- The sizecontWeight
.
pairOfLengths
A list of lengthPair
with wordLengthsLen
elements that represents the unique
sizes of the words.
lengthPair
-
wordLength
(4 bytes) - The size of the word as stored inwordList
in bytes. -
wordCount
(4 bytes) - How many such words are there.
wordList
A concatenation of all the unique words (only ASCII is supported for now). The words appear ordered by their length and then sorted alphanumerically. Words of length 0 can exist (obviously only with count 1).
hashTable
Has a size in bits of (wordTupleSize
+ offsetSize
) * hashTableLen
.
This is a hash table structure of wordTuple
to chain
offset. Offsets are
used instead of element number since chain
contains lists which have variable
lengths.
Since this is a hash table, not all values are used. Unused values are zeroed out.
chain
This is a lists of possible continuation words with their weight: a list of
contList
. So these are the values that the hash table points to.
contList
-
pairCount
(contListSize
bits) - The number ofpairs
that will follow. -
pairs
- The list ofcontListPair
.
pairs
-
contWord
(wordSize
bits) - The word in this continuation. -
contWeight
(contWeightSize
bits) - The weight of the word within this possible continuation.
License
MIT