Naivete Precedes Misrepresentation


    1.1.3 • Public • Published

    Mongo HyperLogLog

    A 32 bit implementation of the hyperloglog algorithm using a MongoDB collection for storage. Each counted key will take 16kb of memory, and will be accurate with minimal error up to cardinalities of 2^27.

    Basic Usage

    import MongoHyperLogLog from "mongo-hyperloglog";
    // mongo collection
    const collection: Collection = myCollection;
    const hyperloglog = new MongoHyperLogLog(collection);
    hyperloglog.on("error", console.error);
    async function trackPurchase (userId: string) {
      await hyperloglog.add("bought_thing", userId);
    async function trackPageView (userId: string) {
      await hyperloglog.add("page_view", userId);
    async function getCounts () {
      return {
        boughtCount: await hyperloglog.count("bought_thing"),
        viewedCount: await hyperloglog.count("page_view"),
        boughtViewedUnion = await hyperloglog.countUnion(["bought_thing", "page_view"]),
        boughtViewedIntersection = await hyperloglog.countIntersection(["bought_thing", "page_view"]),


    const hll = new MonogHyperLogLog(collection, {
      immediateFlush: false,
      // should this immediately send all updates to mongodb? queueing is the default
      syncIntervalMS: 5000, 
      // how often this flushes the queued updates to mongodb. ignored if immediateFlush is set to `true`
      hash: (s: string) => myCustomHashingFunction(s),
      // mongologlog uses sha256 for hashing, but you're welcome to provide your own hash function


    add(key: string, value: string): Promise<void>; // queues value to add to sketch for key
    count(key: string): Promise<number>; // estimated count for key
    countUnion(keys: string[]): Promise<number>; // estimated count for union of keys
    countIntersection(keys: string[]): Promise<number>; // estimated count for intersection of keys
    close (): Promise<void> // flushes data & removes listeners

    Details & Caveats

    If you need a hyperloglog counter & can use redis's hyperloglog implementation instead, you should. The redis implementation is much faster because it's supported natively by redis (and because redis is simply much faster than mongo), and uses hyperloglog++ optimizations for sparse data sets & forumalae from New cardinality estimation algorithms for HyperLogLog sketches for increased accuracy.

    • this is a 32 bit implementation of hyperloglog, and will be slightly inaccurate for cardinalities above 2 ^ 32 / 30
    • this does not include hyperloglog++ bias correction when switching from linear counting to hyperloglog or other estimation algorithm improvements
    • WiredTiger's snapshotting model is well suited to this type of counting, but it's still not as fast as it would be if these counting operations were natively supported
    • by default, a MongoHyperLogLog will send updates to mongo every 1000ms (configurable by syncIntervalMS). To flush manually before exiting: await hyperloglog.close().
    • Mongo HyperLogLog registers store single characters to represent estimates between 0-31, which makes each register use 1 byte of memory.

    Further reading


    npm i mongo-hyperloglog

    DownloadsWeekly Downloads






    Unpacked Size

    17.4 kB

    Total Files


    Last publish


    • kelwill