divsufsort
node binding for Yuta Mori's libdivsufsort
This library provides a simple and an efficient C API to construct a suffix array and a Burrows-Wheeler transformed string from a given string over a constant-size alphabet. The algorithm runs in O(n log n) worst-case time using only 5n+O(1) bytes of memory space, where n is the length of the string.
See: https://github.com/y-256/libdivsufsort
Usage
var divsufsort = divsufsort divbwt = divbwt; var lodash = assert = ; /* Suffix array construction */var t = 'abracadabra' sa = 4 * tlength; // Must be >= 4x input length // "sa" receives the suffix array, packed as 32-bit integers. Will throw if // anything goes wrong. ;assert; /* Burroughs-Wheeler transform */var t = 'banana' u = tlength aux = 4 * tlength; // "u" receives the BWT transformed version of "t". "aux" is temporary storage.// Returns the primary index (index of original first char in t). Will throw if // anything goes wrong.var idx = ;var us = u result = us + '$' + us;assert; /* For testing */ { var ret = ; forvar i=0; i<buflength; i+=4 ret; return ret;}
Install
Ensure the libdivsufsort library and headers are installed as per instructions below. This addon uses whatever is installed on your system, not a vendored copy.
npm install divsufsort
libdivsufsort for OSX
brew install homebrew/science/libdivsufsort
libdivsufsort for Linux / Unix
- Requires cmake >= 2.4.2, make, C compiler.
git clone https://github.com/y-256/libdivsufsortmkdir libdivsufsort/buildcd libdivsufsort/buildcmake -DCMAKE_BUILD_TYPE="Release" -DCMAKE_INSTALL_PREFIX="/usr/local" .. make && sudo make install
Notes
The largest buffer I could allocate on 64-bit OSX builds of node:
var b = Math - 1; // Worksvar c = Math; // Fails
- 0.10.36 - RangeError: length > kMaxLength
- 0.12.0 - RangeError: Attempt to allocate Buffer larger than maximum size: 0x3fffffff bytes
So a bit less than 1GB in both cases. This means that the largest strings you can work with will be 256MB or so.
SA construction or BWT on a 256MB string blocks for about 2 secs on my laptop, so if you're not writing a console program you probably want to do it in an external process.
TODO
- Consider whether it is possible to do a 64-bit version.
- Currently, no effort is made to handle Unicode or UTF8-encoded strings. Part of the problem is that the lib uses some auxiliary space which is quadratic in the size of the alphabet.
- It would be possible to make the binding more "JavaScript-ey" by removing the requirement for pre-allocated buffers.