Naivete Precedes Misrepresentation

    mongo-hyperloglog

    1.1.3 • Public • Published

    Mongo HyperLogLog

    A 32 bit implementation of the hyperloglog algorithm using a MongoDB collection for storage. Each counted key will take 16kb of memory, and will be accurate with minimal error up to cardinalities of 2^27.

    Basic Usage

    import MongoHyperLogLog from "mongo-hyperloglog";
    // mongo collection
    const collection: Collection = myCollection;
    const hyperloglog = new MongoHyperLogLog(collection);
    hyperloglog.on("error", console.error);
    
    async function trackPurchase (userId: string) {
      await hyperloglog.add("bought_thing", userId);
    }
    async function trackPageView (userId: string) {
      await hyperloglog.add("page_view", userId);
    }
    
    async function getCounts () {
      return {
        boughtCount: await hyperloglog.count("bought_thing"),
        viewedCount: await hyperloglog.count("page_view"),
        boughtViewedUnion = await hyperloglog.countUnion(["bought_thing", "page_view"]),
        boughtViewedIntersection = await hyperloglog.countIntersection(["bought_thing", "page_view"]),
      }
    }

    Options

    const hll = new MonogHyperLogLog(collection, {
      immediateFlush: false,
      // should this immediately send all updates to mongodb? queueing is the default
    
      syncIntervalMS: 5000, 
      // how often this flushes the queued updates to mongodb. ignored if immediateFlush is set to `true`
    
      hash: (s: string) => myCustomHashingFunction(s),
      // mongologlog uses sha256 for hashing, but you're welcome to provide your own hash function
    });

    Methods

    add(key: string, value: string): Promise<void>; // queues value to add to sketch for key
    count(key: string): Promise<number>; // estimated count for key
    countUnion(keys: string[]): Promise<number>; // estimated count for union of keys
    countIntersection(keys: string[]): Promise<number>; // estimated count for intersection of keys
    
    close (): Promise<void> // flushes data & removes listeners

    Details & Caveats

    If you need a hyperloglog counter & can use redis's hyperloglog implementation instead, you should. The redis implementation is much faster because it's supported natively by redis (and because redis is simply much faster than mongo), and uses hyperloglog++ optimizations for sparse data sets & forumalae from New cardinality estimation algorithms for HyperLogLog sketches for increased accuracy.

    • this is a 32 bit implementation of hyperloglog, and will be slightly inaccurate for cardinalities above 2 ^ 32 / 30
    • this does not include hyperloglog++ bias correction when switching from linear counting to hyperloglog or other estimation algorithm improvements
    • WiredTiger's snapshotting model is well suited to this type of counting, but it's still not as fast as it would be if these counting operations were natively supported
    • by default, a MongoHyperLogLog will send updates to mongo every 1000ms (configurable by syncIntervalMS). To flush manually before exiting: await hyperloglog.close().
    • Mongo HyperLogLog registers store single characters to represent estimates between 0-31, which makes each register use 1 byte of memory.

    Further reading

    Install

    npm i mongo-hyperloglog

    DownloadsWeekly Downloads

    9

    Version

    1.1.3

    License

    ISC

    Unpacked Size

    17.4 kB

    Total Files

    4

    Last publish

    Collaborators

    • kelwill