node package manager
Loosely couple your services. Use Orgs to version and reuse your code. Create a free org »



forking content-addressed append-only historical key/value blob store over leveldb with multi-master replication

Conflicts are unavoidable, particularly when latency is high. Instead of hiding that fundamental fact or going into a conflict panic mode that demands an immediate resolution, forkdb anticipates and welcomes conflicts.

Interfaces built on forkdb should be honest about the underlying data model and embrace conflicts too.

For a lower-level version of just the link management for multi-master replication, check out fwdb, upon which this library is based.

build status


Here we'll create a new document with the contents beep boop under the key "blorp".

$ echo beep boop | forkdb create blorp

This document is now the singular head of the blorp key:

$ forkdb forks blorp

But now, we'll make a new document that links back to the document we just created and see that the head has updated to the new document's hash:

$ echo BEEP BOOP | forkdb create blorp \
$ forkdb forks blorp

But suppose that while we were making our BEEP BOOP update, somebody else was working on an edit to the same previous hash, 9c056451. In other words, a conflict!

$ echo BeEp BoOp | forkdb create blorp \

This is no problem for forkdb. There are just 2 forks of the blorp key now, which is completely fine:

$ forkdb forks blorp

A UI could show both (or more!) versions side by side or perhaps have a branch where the files diverge.

However, we can also merge these 2 documents back into 1 by creating a new document that points back at both forks:

$ echo BEEPITY BOOPITY | forkdb create blorp \
  --prev=6c0c881fad7adb3fec52b75ab0de8670391ceb8847c8e4c3a2dce9a56244b328 \

and now we're back to a single head:

$ forkdb forks blorp

However, all of the previous states of the blorp key were saved into the history, which we can inspect by picking a key (in this case, the new head e3bd9d14) and traversing back through the branches to end up at the tail:

$ forkdb history 058647fc544f70a96d5d083ae7e3c373b441fc3d55b993407254fcce3c732f1e
+- blorp :: 058647fc544f70a96d5d083ae7e3c373b441fc3d55b993407254fcce3c732f1e
 +- blorp :: 6c0c881fad7adb3fec52b75ab0de8670391ceb8847c8e4c3a2dce9a56244b328
 |- blorp :: 9c0564511643d3bc841d769e27b1f4e669a75695f2a2f6206bca967f298390a0
 +- blorp :: f5ff29843ef0658e2a1e14ed31198807ce8302936116545928756844be45fe41
 |- blorp :: 9c0564511643d3bc841d769e27b1f4e669a75695f2a2f6206bca967f298390a0


First, we'll populate two databases, /tmp/a and /tmp/b with some data:

$ echo beep boop | forkdb -d /tmp/a create msg
$ echo RAWR | forkdb -d /tmp/a create msg \
$ echo moo | forkdb -d /tmp/b create msg \

Now we can use dupsh to pipe replication endpoints for /tmp/a and /tmp/b together:

$ dupsh 'forkdb sync -d /tmp/a' 'forkdb sync -d /tmp/b'

dupsh is handy here because the sync command reads from stdin and writes to stdout. You can sync two forkdbs over the network with any duplex transport.

For example, with netcat we can create a server on port 5000:

$ dupsh 'forkdb sync -d /tmp/a' 'nc -l 5000'

and then elsewhere we can connect to port 5000 for replication:

$ dupsh 'forkdb sync -d /tmp/b' 'nc localhost 5000'

No matter how you get the data to each database, everything is now in sync!

$ forkdb -d /tmp/a forks msg
$ forkdb -d /tmp/b forks msg

If we make a merge update on /tmp/b:

$ echo woop | forkdb -d /tmp/b create msg \
  --prev=071f8d4403f88ca431023ec12a277b28bcd68ab41c5043a5bf7e690b23ba7184 \

and then merge again:

$ dupsh 'forkdb sync -d /tmp/a' 'forkdb sync -d /tmp/b'

now the data is merged on both databases:

$ forkdb -d /tmp/a forks msg
$ forkdb -d /tmp/b forks msg

Replication woo.

api example

Create a forkdb instance by passing in a leveldown or levelup handle and a path to where the blobs should go. Then you can use createWriteStream(meta) to save some data:

var db = require('level')('/tmp/edit.db');
var fdb = require('forkdb')(db, { dir: '/tmp/edit.blob' });
var meta = JSON.parse(process.argv[2]);
var w = fdb.createWriteStream(meta, function (err, id) {
    if (err) console.error(err)
    else console.log(id)

Now give the program some data on stdin:

$ echo beep boop | node create.js '{"key":"blorp"}'

data model

The data model is append-only. Each document operates under a key and may reference zero or more other documents by the hash of their content, which always point backward in time. That is, to have a link to a document, the document must first exist because the link is the hash of its content.

Each document can link back to zero, one, or many other documents from any other key. You can make these links mean semantically whatever you wish, but given how heads, tails, and forward indexes are generated from these backward links, this is a helpful semantic model to use:

  • n = 0 - a new document with no history
  • n = 1 - an update to an existing document
  • n >= 2 - merge multiple documents together

For each of these conditions, the heads and tails are updated:

  • n = 0 - a new head and new tail are added
  • n = 1 - a head is removed, a new head is added
  • n >= 2 - n heads are removed, one head is added

Each update with a backward link also generates a forward link to enable fast forward and backward traversals. Forward links, heads, and tails are all generated purely for performance reasons since these can all be computed, albeit slowly, from the backward links.


var forkdb = require('forkdb')

var fdb = forkdb(db, opts)

Create a new forkdb instance fdb from a levelup or leveldown db.

Optionally set:

  • - uniquely identify the current instance. This id is used to negotiate sequences for replication and MUST be unique to correctly identify this database.
  • opts.dir - directory to use for blob storage, default: './forkdb.blob'
  • - content-addressable abstract-blob-store to use instead of content-addressable-blob-store

To run both the command-line tool and the api over the same data simultaneously, use level-party to create the db.

var w = fdb.createWriteStream(meta, opts={}, cb)

Save the data written to the writable stream w into blob storage at meta.key. To link back to previous documents, specify an array of objects with key and hash properties as meta.prev. For example:

meta.key = 'blorp';
meta.prev = [
  { key: 'blorp', hash: 'fcbcbe4389433dd9652d279bb9044b8e570d7f033fab18189991354228a43e99' },
  { key: 'blorp', hash: 'c3122c908bf03bb8b36eaf3b46e27437e23827e6a341439974d5d38fb22fbdfc' }

cb(err, key) fires when an error occurs or when all the data has been written successfully to blob storage and leveldb under the key key.

Optionally, you can set an opts.prebatch(rows, key, fn) function that gets runs before db.batch() with the hash of the content, key. Your prebatch function should call fn(err, rows) with the rows to insert.

var r = fdb.createReadStream(hash)

Return a readable stream r with the blob content at hash.

var r = fdb.forks(key)

Return a readable object stream r that outputs an object with key and hash properties for every head of key.

If key is undefined, all heads from all the keys are output.

var r = fdb.tails(key)

Return a readable object stream r that outputs an object with key and hash properties for every tail of key.

If key is undefined, all tails from all the keys are output.

var r = fdb.list(opts)

Return a readable object stream r that outputs each metadata object for every document in the database.

Constrain the output stream by passing in,, or opts.limit.

var r = fdb.keys(opts)

Return a readable object stream r that outputs a record for every key in the database.

Constrain the output stream by passing in,, or opts.limit.

fdb.get(hash, cb)

Get the metadata for hash and call cb(err, meta) with the result.

var r = fdb.links(hash)

Return a readable object stream r that outputs an object with key and hash properties for every forward link of hash.

var r = fdb.history(hash)

Return a readable object stream r that traverses backward starting from hash, outputting a metadata object for each document in the history.

When the traversal comes to a branch, r ends and emits a 'branch' event with a b object for each branch. The branch object b has the same behavior as r and operates recursively.

var r = fdb.future(hash)

Return a readable object stream r that traverses forward starting from hash, outputting a metadata object for each document in the future history.

When the traversal comes to a branch, r ends and emits a 'branch' event with a b object for each branch. The branch object b has the same behavior as r and operates recursively.

var d = fdb.replicate(opts={}, cb)

Return a duplex stream d to replicate with another forkdb. Pipe the endpoints to each other, duplex stream style:

var d = fdb.replicate();

for some full-duplex stream, for example from a tcp connection.

Specify the replication strategy with opts.mode:

  • opts.mode === 'sync' - multi-master replication (default strategy)
  • opts.mode === 'push' - only send updates
  • opts.mode === 'pull' - only receive updates

Note that if both endpoints try to push or both endpoints try to pull from each other, nothing will happen.

forkdb saves the last sequence successfully replicated for each remote host to avoid sending hashes for sequences that remote hosts already know about.

Set to true to keep the stream open for continuous streaming replication as new documents are created.

fdb.concestor(hashes, cb)

Compute the concestor for hashes, an array of strings.

cb(err, cons) fires with cons, an array of concestors or an empty array if there is no common ancestor. If there is a tie for recency, cons will contain more than one ancestor.


usage: forkdb COMMAND OPTIONS
  Global options are:
    -d, --dir  directory to use for both db and blob storage
               If not specified, uses $FORKDB_DIR or ./forkdb
    --blobdir  directory to use for blob storage
    --dbdir    directory to use for db
forkdb create KEY {--prev=HASH ...}
  Create a new document with content from stdin under KEY.
  Set pointers to previous content with "--prev". To point back at multiple
  documents (a merge), use --prev more than once.
forkdb list {--lt=LT, --gt=GT, --limit=LIMIT}
  List all the document metadata in the database.
  Optionally set LT, GT, and LIMIT constraints on the output.
forkdb keys {--lt=LT, --gt=GT, --limit=LIMIT}
  List all the active keys in the database, one key per line.
forkdb read HASH
  Print the contents for HASH to stdout.
forkdb get HASH
  Print the metadata for HASH to stdout as json.
forkdb forks KEY
  Print the hash of every head for KEY on its own line.
forkdb tails KEY
  Print the hash of every tail for KEY on its own line.
forkdb links HASH
  Print newline-delimited json for the `key` and `hash` properties of each
  forward link back to HASH.
forkdb history HASH
  Print an ascii diagram to stdout tracing HASH back in time to its tails.
forkdb future
  Print an ascii diagram to stdout tracing HASH forward in time to its heads.
forkdb concestor HASH1 HASH2...
  Print the most recent common ancestor(s) for the list of HASHes.
forkdb sync {OPTIONS} # multi-master replication
forkdb push {OPTIONS} # push updates
forkdb pull {OPTIONS} # pull updates
  Replicate with another forkdb using a replication strategy.
  stdin and stdout are used for incoming and outgoing traffic.
    --live  Keep the connection open for additional updates after the initial
            replication phase.
forkdb help
  Show this message.


With npm, to get the forkdb command do:

npm install -g forkdb

and to get the library do:

npm install forkdb