hyperlolo
TypeScript icon, indicating that this package has built-in type declarations

0.4.0 • Public • Published

hyperlolo

Build npm version License

Description

A basic implementation of HyperLogLog, a probabalistic datastructure useful for approximating the number of distinct elements in a multiset. This approach allows for O(1) time complexity and O(log log n) space complexity w.r.t set size n.

Features

  • Custom Backends: Implement the Hasher interface to use the hash function of your choice. A 32-bit Jenkins hash backend is provided by default.
  • Serialization
  • Type Definitions
  • Lower-Bound Correction: 'Linear Counting' is used when the estimated cardinality is low relative to the number of configured registers.
  • Upper-Bound Correction: Estimated cardinality is adjusted when found to be high relative to the size of the hash space.
  • An additional correction is made to compensate for 'systematic multiplicative bias' resulting from hash collisions.

Usage

Basic

import { HyperLogLog } from 'hyperlolo'

// Initialize counter
const counter = new HyperLogLog({ hasherId: 'jenkins32', precision: 12 }) // 12-bit register index = 4096 registers

// Perform 50 million insertions of 15 million distinct values
const inserts = 50000000
const distincts = 15000000

// Count values
for (var i=0; i<inserts; i++)
  counter.add(i % distincts)

// Approximate count
const count = counter.count()
console.log(`[count] estimate = ${count}, expected = ${distincts}, error = ${count - distincts}`)

Merging

Counters can be merged, allowing for tracking cardinality in distributed scenarios. Here we merge the counter created above with another whose set has a 50% overlap with the first. The estimated cardinality of the merged counter will be 1.5x the original.

// Another counter, somewhere else
const another = new HyperLogLog({ hasherId: 'jenkins32', precision: 12 })

// Count values with only half the range overlapping the the original set
const offset = Math.round(distincts / 2)
for (var i=0; i<inserts; i++)
  another.add(i % distincts + offset)

// Merge first counter with the second
const merged = counter.merge(another)
const mergedCount = merged.count()
console.log(`[merged] estimate = ${mergedCount}, expected = ${count + offset}, error = ${mergedCount - count - offset}`)

Serialization

Counters can be converted to a Buffer for persistance or distribution:

const buffer = counter.serialize()

And deserialized somewhere else:

const counter = HyperLogLog.deserialize(buffer)

The deserialized counter will inherit the options of the original.

If using a custom backend, the Hasher implementation of the counter being deserialized must be registered with the local HasherFactory.

References

  • "HyperLogLog" Wikipedia. Wikimedia Foundation, n.d. Mon. 31 Jul. 2023.

License

Everything in this repo is ISC License unless otherwise specified

Package Sidebar

Install

npm i hyperlolo

Weekly Downloads

183

Version

0.4.0

License

ISC

Unpacked Size

104 kB

Total Files

53

Last publish

Collaborators

  • weston