document-tfidf

0.2.1 • Public • Published

Getting Started

Install package with:

  npm install document-tfidf

Features:

  • countTermFrequencies
  • storeTermFrequencies
  • normalizeTermFrequencies
  • identifyUniqueTerms
  • fullTFIDFAnalysis

Documentation

  • Term Frequency - Inverse Document Frequency (TFIDF) Module:
    • countTermFrequencies: function(text [, options])
      • Counts the number of times each token appears in the input text.
      • Current options include tokenLength, which dictates the number of words that comprise each token. tokenLength defaults to 1.
      • Depends on nGrams module, which can get all tokens with arbitrary length.
    • storeTermFrequencies: function(tokenSet, TFStorage)
      • Adds the tokenSet to the collectionStorage for improved analysis over time.
      • It’s recommended to save this collection in a persistent data store, although this is unnecessary.
      • If collectionStorage is not provided, it will create it as an object and return that object.
    • normalizeTermFrequencies: function(tokenSet, TFStorage)
      • For each token in tokenSet, normalizeTermFrequencies will divide its count by the total number found in TFStorage and return the token set with normalized counts.
    • identifyUniqueTerms: function(normalizedTokenSet [, options])
      • From the input normalizedTokenSet, identifyUniqueTerms will return the most unique tokens, as defined by the highest TFIDF
      • Current options include uniqueThreshold. If specified, identifyUniqueTerms will return all terms with a TFIDF equal to or greater than the uniqueThreshold
    • fullTFIDAnalysis: function(text [, options])
      • Completes all of the above TFIDF calculations
      • options correspond with the options for each piece of the analysis

View the full specs and check out more text analysis in my Text Analysis Suite.

Package Sidebar

Install

npm i document-tfidf

Weekly Downloads

4

Version

0.2.1

License

ISC

Last publish

Collaborators

  • syeoryn