Have ideas to improve npm?Join in the discussion! »

    word2vec-stream

    1.0.1 • Public • Published

    npm version Build Status

    word2vec-stream

    Process your massive word2vec binary model file as a readable stream of records.

    Purpose

    Word2vec models are typically distributed as massive binary files (for instance, the standard GoogleNews set is several gigs once unzipped). In some cases, you may wish to process these models and persist all or part of their contents to a database or other source, without hitting the considerable memory usage needed to read it all into memory at once.

    This tiny library is merely a handy function that parses the binary format and offers a readable stream of objects containing the word and the value (vector array).

    Usage

    The function exported by word2vec-stream returns a promise, which resolves to a readable stream:

    const word2vecStream = require('word2vec-stream');
     
    word2vecStream('./path-to-your-model.bin').then((vectorStream) => { 
      let myRecords = [];
     
      const readInOne = () => {
        const nextWord = vectorStream.read();
        if (nextWord===null) return;
     
        myRecords.push(nextWord);  // just pushing onto an array here, but normally you'd write to a db, etc
        readInOne();
      }
     
      vectorStream.on('readable', () => {
        readInOne();
      });
     
      vectorStream.on('end', () => {
        // you've processed all words now
      });
    });

    A single word object looks like this:

    { word: 'runs',
      values: [
         -0.03380169719457626,
         0.05194384977221489,
         -0.03704818710684776,
         0.016614392399787903,
         0.0660756304860115,
         0.030364234000444412,
         -0.028072593733668327,
         -0.16270646452903748,
         -0.038575947284698486,
         0.12756797671318054,
         // ... as many floats as vector dimensions here
      }
    }

    Or examine and run the demo.js file for a quick example (dumping records to console). Included tests also demonstrate basic invocation.

    $ node demo.js
    

    Compatibility

    This library targets node v8, though may work a little further back; some necessary elements of the stream.Readable API may not be supported in older versions.

    Acknowledgements

    Thanks to node-word2vec for illustrating the basic syntax of parsing the binary format in node.

    Keywords

    none

    Install

    npm i word2vec-stream

    DownloadsWeekly Downloads

    5

    Version

    1.0.1

    License

    MIT

    Last publish

    Collaborators

    • avatar