node package manager

k-bucket

Kademlia DHT K-bucket implementation as a binary tree

k-bucket

Stability: 2 - Stable

Kademlia DHT K-bucket implementation as a binary tree.

@tristanls, @mikedeboer, @deoxxa, @feross, @nathanph, @allouis, @fanatid

npm install k-bucket
npm test
var KBucket = require('k-bucket');
 
var kBucket = new KBucket({
    localNodeId: new Buffer("my node id") // default: random data 
})

A Distributed Hash Table (DHT) is a decentralized distributed system that provides a lookup table similar to a hash table.

k-bucket is an implementation of a storage mechanism for keys within a DHT. It stores contact objects which represent locations and addresses of nodes in the decentralized distributed system. contact objects are typically identified by a SHA-1 hash, however this restriction is lifted in this implementation. Additionally, node ids of different lengths can be compared.

This Kademlia DHT k-bucket implementation is meant to be as minimal as possible. It assumes that contact objects consist only of id. It is useful, and necessary, to attach other properties to a contact. For example, one may want to attach ip and port properties, which allow an application to send IP traffic to the contact. However, this information is extraneous and irrelevant to the operation of a k-bucket.

This k-bucket implementation implements a conflict resolution mechanism using an arbiter function. The purpose of the arbiter is to choose between two contact objects with the same id but perhaps different properties and determine which one should be stored. As the arbiter function returns the actual object to be stored, it does not need to make an either/or choice, but instead could perform some sort of operation and return the result as a new object that would then be stored. See kBucket._update(node, index, contact) for detailed semantics of which contact (incumbent or candidate) is selected.

For example, an arbiter function implementing a vectorClock mechanism would look something like:

// contact example 
var contact = {
    id: new Buffer('contactId'),
    vectorClock: 0
};
 
function arbiter(incumbent, candidate) {
    if (incumbent.vectorClock > candidate.vectorClock) {
        return incumbent;
    }
    return candidate;
};

Alternatively, consider an arbiter that implements a Grow-Only-Set CRDT mechanism:

// contact example 
var contact = {
    id: new Buffer('workerService'),
    workerNodes: {
        '17asdaf7effa2': { host: '127.0.0.1', port: 1337 },
        '17djsyqeryasu': { host: '127.0.0.1', port: 1338 }
    }
};
 
function arbiter(incumbent, candidate) {
    // we create a new object so that our selection is guaranteed to replace 
    // the incumbent 
    var merged = {
        id: incumbent.id, // incumbent.id === candidate.id within an arbiter 
        workerNodes: incumbent.workerNodes
    };
 
    Object.keys(candidate.workerNodes).forEach(function (workerNodeId) {
        merged.workerNodes[workerNodeId] = candidate.workerNodes[workerNodeId];
    });
 
    return merged;
}

Notice that in the above case, the Grow-Only-Set assumes that each worker node has a globally unique id.

Implementation of a Kademlia DHT k-bucket used for storing contact (peer node) information.

For a step by step example of k-bucket operation you may find the following slideshow useful: Distribute All The Things.

KBucket starts off as a single k-bucket with capacity of k. As contacts are added, once the k+1 contact is added, the k-bucket is split into two k-buckets. The split happens according to the first bit of the contact node id. The k-bucket that would contain the local node id is the "near" k-bucket, and the other one is the "far" k-bucket. The "far" k-bucket is marked as don't split in order to prevent further splitting. The contact nodes that existed are then redistributed along the two new k-buckets and the old k-bucket becomes an inner node within a tree data structure.

As even more contacts are added to the "near" k-bucket, the "near" k-bucket will split again as it becomes full. However, this time it is split along the second bit of the contact node id. Again, the two newly created k-buckets are marked "near" and "far" and the "far" k-bucket is marked as don't split. Again, the contact nodes that existed in the old bucket are redistributed. This continues as long as nodes are being added to the "near" k-bucket, until the number of splits reaches the length of the local node id.

As more contacts are added to the "far" k-bucket and it reaches its capacity, it does not split. Instead, the k-bucket emits a "ping" event (register a listener: kBucket.on('ping', function (oldContacts, newContact) {...}); and includes an array of old contact nodes that it hasn't heard from in a while and requires you to confirm that those contact nodes still respond (literally respond to a PING RPC). If an old contact node still responds, it should be re-added (kBucket.add(oldContact)) back to the k-bucket. This puts the old contact on the "recently heard from" end of the list of nodes in the k-bucket. If the old contact does not respond, it should be removed (kBucket.remove(oldContact.id)) and the new contact being added now has room to be stored (kBucket.add(newContact)).

Public API

  • incumbent: Object Contact currently stored in the k-bucket.
  • candidate: Object Contact being added to the k-bucket.
  • Return: Object Contact to updated the k-bucket with.

Default arbiter function for contacts with the same id. Uses contact.vectorClock to select which contact to update the k-bucket with. Contact with larger vectorClock field will be selected. If vectorClock is the same, candidat will be selected.

  • firstId: Buffer Buffer containing first id.
  • secondId: Buffer Buffer containing second id.
  • Return: Integer The XOR distance between firstId and secondId.

Default distance function. Finds the XOR distance between firstId and secondId.

  • options:
    • arbiter: Function (Default: vectorClock arbiter) function (incumbent, candidate) { return contact; } An optional arbiter function that given two contact objects with the same id returns the desired object to be used for updating the k-bucket. For more details, see arbiter function.
    • distance: Function function (firstId, secondId) { return distance } An optional distance function that gets two id Buffers and return distance (as number) between them.
    • localNodeId: Buffer An optional Buffer representing the local node id. If not provided, a local node id will be created via crypto.randomBytes(20).
    • numberOfNodesPerKBucket: Integer (Default: 20) The number of nodes that a k-bucket can contain before being full or split.
    • numberOfNodesToPing: Integer (Default: 3) The number of nodes to ping when a bucket that should not be split becomes full. KBucket will emit a ping event that contains numberOfNodesToPing nodes that have not been contacted the longest.

Creates a new KBucket.

  • contact: Object The contact object to add.
    • id: Buffer Contact node id.
    • Any satellite data that is part of the contact object will not be altered, only id is used.
  • Return: Object The k-bucket itself.

Adds a contact to the k-bucket.

  • id: Buffer Contact node id.
  • n: Integer (Default: Infinity) The maximum number of closest contacts to return.
  • Return: Array Maximum of n closest contacts to the node id.

Get the n closest contacts to the provided node id. "Closest" here means: closest according to the XOR metric of the contact node id.

  • Return: Number The number of contacts held in the tree

Counts the total number of contacts in the tree.

  • id: Buffer The ID of the contact to fetch.
  • Return: Object The contact if available, otherwise null

Retrieves the contact.

  • id: Buffer The ID of the contact to remove.
  • Return: Object The k-bucket itself.

Removes contact with the provided id.

  • Return: Array All of the contacts in the tree, as an array

Traverses the tree, putting all the contacts into one array.

CAUTION: reserved for internal use

  • node: internal object that has 2 leafs: left and right
  • id: Buffer Id to compare localNodeId with.
  • bitIndex: Integer (Default: 0) The bit index to which bit to check in the id Buffer.
  • Return: Object left leaf if id at bitIndex is 0, right leaf otherwise.

CAUTION: reserved for internal use

  • node: internal object that has 2 leafs: left and right
  • id: Buffer Contact node id.
  • Return: Integer Index of contact with provided id if it exists, -1 otherwise.

Returns the index of the contact with provided id if it exists, returns -1 otherwise.

CAUTION: reserved for internal use

  • node: Object node for splitting
  • bitIndex: Integer (Default: 0) The bit index to which bit to check in the id Buffer.

Splits the node, redistributes contacts to the new nodes, and marks the node that was split as an inner node of the binary tree of nodes by setting self.contacts = null. Also, marks the "far away" node as dontSplit.

CAUTION: reserved for internal use

  • node: internal object that has 2 leafs: left and right
  • index: Integer The index in the bucket where contact exists (index has already been computed in previous calculation).
  • contact: Object The contact object to update.
    • id: Buffer Contact node id
    • Any satellite data that is part of the contact object will not be altered, only id is used.

Updates the contact by using the arbiter function to compare the incumbent and the candidate. If arbiter function selects the old contact but the candidate is some new contact, then the new contact is abandoned. If arbiter function selects the old contact and the candidate is that same old contact, the contact is marked as most recently contacted (by being moved to the right/end of the bucket array). If arbiter function selects the new contact, the old contact is removed and the new contact is marked as most recently contacted.

  • newContact: Object The new contact that was added.

Emitted only when newContact was added to bucket and it was not stored in the bucket before.

  • oldContacts: Array The array of contacts to ping.
  • newContact: Object The new contact to be added if one of old contacts does not respond.

Emitted every time a contact is added that would exceed the capacity of a don't split k-bucket it belongs to.

  • contact: Object The contact that was removed.

Emitted when contact was removed from the bucket.

  • oldContact: Object The contact that was stored prior to the update.
  • newContact: Object The new contact that is now stored after the update.

Emitted when a previously existing ("previously existing" means oldContact.id equals newContact.id) contact was added to the bucket and it was replaced with newContact.

Current releases.

We follow the semantic versioning policy (semver.org) with a caveat:

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.

caveat: Major version zero is a special case indicating development version that may make incompatible API changes without incrementing MAJOR version.

The implementation has been sourced from: