📮 compatto
Compatto is a tiny and fast compression library with Unicode support, that works well with small strings too
Compatto is based on antirez's smaz concept. It targets modern browsers and Node.js. For older browsers and Node.js versions, you will need to transpile and use a TextEncoder
and TextDecoder
polyfill.
Features
-
⏱ Very fast to compress, even faster to decompress -
🍯 Support for Unicode characters, like emojis -
🗄 User-definable dictionary
Compression ratio
Being a dictionary-based compression algorithm, the compression ratio is heavily influenced by the dictionary one uses.
With the default dictionary the compression ratio is around 1.67
for The Great Gatsby: it is compressed from 269,716
bytes in just 161,583
, in 70ms
. A simple string like this is a string
tho, is compressed from 16
bytes in 6
, so the compression ratio would be 2.66
... Results may vary, I guess
Install
$ npm install compatto
Or if you prefer using Yarn:
$ yarn add compatto
Usage
import { dictionary } from "compatto/dictionary"
import { compatto, DecompressError } from "compatto"
const { compress, decompress } = compatto({ dictionary })
const compressedString = compress("this is a string")
// => Uint8Array [ 155, 56, 172, 62, 195, 70 ]
const decompressedString = decompress(compressedString)
// => 'this is a string'
API
compatto(options)
Create a new object that implements the Compatto
interface, using the options you provide.
options
Type: object
dictionary
Type: string[]
A dictionary used to compress and decompress strings. If its length is greater than 254
a TypeError
will be thrown.
Please note that, as of v2.0
, this option has no default value, the user has to explicitly pass it.
Compatto
Compatto is an interface that has two methods: compress()
and decompress()
.
The returned value of compatto()
implements this interface.
compress(string)
Compress a string into an array of bytes, returned as an instance of Uint8Array
.
Throws a TypeError
if the argument is not the correct type.
string
Type: string
A string to compress.
decompress(bytes)
Decompress an instance of Uint8Array
to the original, uncompressed, string.
Throws a TypeError
if the argument is not the correct type.
Throws a DecompressError
if the buffer is not correctly encoded. It can be imported along with compatto()
if you want to check if the error thrown is an instance of this class.
bytes
Type: Uint8Array
An array of bytes representing a compressed string.
Please note that if the dictionary used to compress a string is not the same used to decompress the generated buffer, the result of the decompression will most likely not be correct.
dictionary
Type: string[]
This is compatto's standard dictionary. Remember that even if it is the standard one, it must be explicitly set by the user!
Performance
Since v2.0
, compatto generates a trie from the dictionary that is used to compress every string. Before v2.0
, compatto tried to get a substring as long as the longest word in the dictionary and see if that substring was in it. If it wasn't, it tried again with a substring that was one character shorter, and so on until the substring was one character.
For compressible strings it was not that slow, but if a word had characters that were not inside the dictionary that approach was really slow!
This implementation change gave compatto a big performance boost
In v2.1
the compress()
algorithm was simplified, thus leading to a performance improvement of about 20% compared to v2.0
Below is a little table that indicates compress()
's performance improvements over the various versions. The file used to test the library is /usr/share/dict/words
: in the first row, the file was split over \n
, while in the second row the whole file was used as a long piece of text.
Data | v1.0 | v2.0 | v2.1 |
---|---|---|---|
235,887 words | ~500ms |
~370ms |
~295ms |
2.5MB raw text | ~700ms |
~465ms |
~365ms |
As you can see the performance improved a lot: now compressing a lot of small words takes about 40% less time, and almost 50% less to compress a long piece of text if we keep v1.0
as reference!
Is there space for improvements? Absolutely! I guess that the compression algorithm can be further improved, and keep in mind that I didn't have time to do code profiling.
Browser support
The latest version of Chrome, Firefox, Safari, and Edge.
Node.js support
Compatto requires Node.js 11 or later.
Related
- hex-my-bytes - Display bytes sequences as strings of hexadecimal digits.