pipe-hash

1.1.6 • Public • Published

pipe-hash

build status AppVeyor Build Status


An identity duplex stream that generates a rolling hash of its passthru. Use for generating/verifying fingerprints of files/directories of arbitrary sizes.


Get it!

npm install --save pipe-hash

Usage

Simply pipe thru a PipeHash instance or call its .fingerprint method.

var fs = require('fs')
var pipeHash = require('pipe-hash')
 
var file = __filename
var selfie = fs.createReadStream(file)
var hashPipe = pipeHash()
 
// high-level way to get a fingerprint from a file or directory
hashPipe.fingerprint(file, { gzip: false }, function (err, fingerprintA) {
  if (err) return console.error(err)
 
  // another way - consuming a readable, net socket or sim
  selfie.pipe(hashPipe)//.pipe(somewhere_else)
 
  // get the fingerprint once the writable side of the hashPipe has finished
  hashPipe.on('fingerprint', function (fingerprintB) {
    console.log('fingerprints identical?', fingerprintB.equals(fingerprintA))
  })
 
})

Make sure not to engage a PipeHash instance in multiple fingerprinting operations simultaneously. That would lead to data races and wrong fingerprints.


API

var hashPipe = pipeHash([opts][, callback])

Create a new PipeHash instance. Options default to:

{
  hash: 'blake2b', // blake2b or any name of crypto's hash functions
  blake2bDigestLength: 64, // these get passed on to blake2b-wasm
  blake2bKey: null,
  blake2bSalt: null,
  blake2bPersonal: null,
  windowKiB: 64 // size of the sliding window/internal buffer in KiB
}

The callback will be called after the writable side of the stream has finished and has the standard signature callback(err, fingerprint). The fingerprint is a buffer.

The fabolous blake2b-wasm module by mafintosh implements blake2b in WebAssembly which allows fast hashing.

hashPipe.fingerprint(filepath[, opts], callback)

Get a fingerprint from a file or directory. Options default to:

{
  gzip: true, // gzip file and tar-packed dir streams before hashing?
  dereference: false // follow symlinks when fs stating filepath?
}

The callback has the signature callback(err, fingerprint) and will be called once the entire file/directory has been hashed. The fingerprint is a buffer.

hashPipe.on('fingerprint', callback)

Emitted once the writable side of the stream has finished. The callback has the signature callback(fingerprint). The fingerprint is a buffer.


Details

Basically a PipeHash instance is just a simple identity duplex stream. You can simply pipe streams or write buffers to it and it will pass them thru identically.

Internally it processes written buffers stepwise. Every time the internal buffer window reaches opts.windowKiB an accumulator hash digest is generated, and the internal buffer cleared subsequently. A fresh accumulator is computed as a buffered xor of itself with the hash of the current window buffer. When pipehashing you can process streams of any size as only chunks of them are held in memory at a given point in time.

A PipeHash instance emits a fingerprint once its writable side has finished. Additionally, you can pass a callback to the constructor which likewise will be called once the writable side of the stream has finished.

You can use the stream for both generation and verfication of fingerprints.

For convenience you can generate a fingerpint of a file or directory by using the public PipeHash.prototype.fingerprint(filepath, opts, callback) method. It internally processes the input stream indicated by filepath and calls the callback with the resulting fingerprint. Note that by default files are gzipped and directories tossed into a tarball (packed as tar archive and then gzipped) before the hashing stage. You can set opts to { gzip: false } to prevent compression before hashing. Knowing whether a fingerprint refers to a compressed or uncompressed buffer is important for successfully verifying fingerprints. Generally, you should just stick to the defaults and use compression when juggling files.

Make sure not to engage one PipeHash instance in multple hashing procedures simultaneously as that would lead to data races in the internal buffer window. It is safe to use a single stream for multiple file hashes as long as the operations are performed one after the other.


License

MIT

Package Sidebar

Install

npm i pipe-hash

Weekly Downloads

1

Version

1.1.6

License

MIT

Last publish

Collaborators

  • chiefbiiko