Uses the BuzHash library to chop up files to feature determined chunks. Good for deduplication.
const ChunkChunk = require('chunkchunk'),
fs = require('fs'),
file = fs.openSync('my/file');
const chunkee = new ChunkChunk(file, { max: 40000 })
// create variable chunks of `my/file` to a maximum size
// of 40k, with the default `min` chunk size of 50% of that.
const chunk = chunkee.nextChunk();
// { hash: 'sha256...', buffer: <Buffer ...> }
const chunks = chunkee.toEnd();
// [ {hash:.., buffer:<..>}, {...}]
fs.close(file);
The following is an example run of npm test
chunking a ~120kb .jpg
of
batman. First column is number of bytes in the chunk, with a max chunk setting
of 40k and a min of 20k. Second column is the sha256 of that chunk.
[master] ~ npm test
> chunkchunk@0.2.0 test C:\code\experiments\chunkchunk
> node test/test
4 chunks in 0s 30.234033ms
39005 QQitkmhKkmWfrX+p59Nk49kctS1TrMHhpnFka08Bya4=
34113 aOsmGJhJUeWNPPVbsyfqMyRx1F28rwQnvWiwwN/qVDo=
21484 FX98NrA8OlKWzSGJpXIAslixSRU4QJBPBEVEkcc9EXA=
23960 BY970o+31e5szl0TIGuDtfnPbH41tzWq2WYcK0Pn+1c=
118562
- [ ] Make ChunkChunk take a string, instead of a file descriptor.
- [ ] This library screams to be made a Transform Stream.
- [ ] Add config option for feature 'uniqueness'