@lolojs/htmlindexer

1.1.0 • Public • Published

This is a library for indexing a document or extracting unique non stopwords tokens and getting their frequency

For indexing call the function IndexDocument and listen for the finish event when indexing completed and also you can access extracted token using the tokens property and it is a Map data structure

const HtmlIndexer =require('./htmlIndexer'); const indexer = new HtmlIndexer();

indexer.IndexDocument("tests/test.html");
    indexer.on("indexFinished", () => {
        for (var key of indexer.tokens.keys()) {
            console.log(`Term : ${key}    Frequency : ${indexer.tokens.get(key)}`);
        }
    });

You can access generated tokens with using stream with getOutPutStream passing chunk size or number of tokens

per chunk and the output is json based with format { term: 'test', freq: 1, isFirstChunk: true, isLastChunk: true }

var stream =indexer.getOutPutStream(2);
        stream.on('data',(data)=>console.log(data));

/@lolojs/htmlindexer/

    Package Sidebar

    Install

    npm i @lolojs/htmlindexer

    Weekly Downloads

    1

    Version

    1.1.0

    License

    ISC

    Unpacked Size

    208 kB

    Total Files

    14

    Last publish

    Collaborators

    • lolojs