multi-core-indexer
Index one or more hypercores
You can use this module to index one or more hypercores. The batch()
function
will be called with every downloaded entry in the hypercore(s), and will be
called as new entries are downloaded or appended. The index state is persisted,
so indexing will continue where it left off between restarts.
This module is useful if you want to create an index of items in multiple Hypercores. There is no guarantee of ordering, so the indexer needs to be able to index unordered data. Sparse hypercores are supported, and undownloaded entries in the hypercore will be indexed when they are downloaded.
Table of Contents
Install
npm install multi-core-indexer
Usage
const MultiCoreIndexer = require('multi-core-indexer')
const raf = require('random-access-file')
const Hypercore = require('hypercore')
function createStorage(key) {
return raf(`./${key}`)
}
function batch(entries) {
for (const { key, block, index } of entries) {
console.log(`Block ${index} of core ${key.toString('hex')} is ${block})
}
}
const cores = [
new Hypercore('./core1'),
new Hypercore('./core2'),
]
await Promise.all(cores.map(c => c.ready()))
const indexer = new MultiCoreIndex(cores, { storage: createStorage, batch })
API
const indexer = new MultiCoreIndexer(cores, opts)
cores
Required
Type: Array<Hypercore>
An array of Hypercores
to index. All Hypercores must share the same value encoding (binary
, utf-8
or json
). All Hypercores must have a key
property that is populated, either by waiting for all of them to be ready or by instantiating them with the key
opt.
opts
Required
Type: object
opts.batch
Required
Type: (entries: Array<{ key: Buffer, block: Buffer | string | Json, index: number }>) => Promise<void>
Called with an array of entries as they are read from the hypercores. The next
batch will not be called until batch()
resolves. Entries will be queued (and
batched) as fast as they can be read, up to opts.maxBatch
. block
is the
block of data from the hypercore, key
is the public key where the block
is
from, and index
is the index of the block
within the hypercore.
Note: Currently if batch
throws an error, things will break, and the
entries will still be persisted as indexed. This will be fixed in a later
release.
opts.storage
Required
Type: (key: string) => RandomAccessStorage
A function that will be called with a hypercore public key as a hex-encoded string, that should return a random-access-storage instance. This is used to store the index state of each hypercore. (Index state is stored as a bitfield).
opts.maxBatch
Optional
Type: number
The max size of each batch in bytes.
opts.byteLength
Optional
Type: (entry: { key: Buffer, block: Buffer | string | Json, index: number }) => number
Optional function that calculates the byte size of input data. By default, if
the value encoding of the underlying Hypercore is binary
or utf-8
, this will
be the byte length of all the blocks in the batch. If the value encoding is
json
then this will be the number of entries in a batch.
indexer.state
Type: IndexState: { current: 'idle' | 'indexing', remaining: number, entriesPerSecond: number }
A getter that returns the current IndexState
, the same as the value emitted by the index-state
event. This getter is useful for checking the state of the indexer before it has emitted any events.
indexer.addCore(core)
core
Required
Type: Hypercore
Add a hypercore to the indexer. Must have the same value encoding as other hypercores already in the indexer.
indexer.close()
Stop the indexer and flush index state to storage. This will not close the underlying storage - it is up to the consumer to do that.
indexer.on('index-state', onState)
onState
Required
Type: (indexState: { current: 'idle' | 'indexing', remaining: number, entriesPerSecond: number }) => void
Event listener for the current indexing state. entriesPerSecond
is the current
average number of entries being processed per second. This is calculated as a
moving average with a decay factor of 5. remaining
is the number of entries
remaining to index. To estimate time remaining in seconds, use
remaining / entriesPerSecond
.
indexer.on('indexing', handler)
handler
Required
Type: () => void
Event listener for when the indexer re-starts indexing (e.g. when unindexed blocks become available, either through an append or a download).
indexer.on('idle', handler)
handler
Required
Type: () => void
Event listener for when the indexer has completed indexing of available data. Note: During sync this can be emitted before sync is complete because the indexer has caught up with currently downloaded data, and the indexer will start indexing again as new data is downloaded.
Maintainers
Contributing
PRs accepted.
Small note: If editing the README, please conform to the standard-readme specification.
License
MIT © 2022 Digital Democracy