word-freq

0.0.8 • Public • Published

word-freq

Build Status

Calculates the word frequency of a text document, by tokenising or tokenising and stemming the string.

Version

  • 0.0.7 Converts all text to lowercase.
  • 0.0.6 Messed up npm versioning.
  • 0.0.5 Moved stemmer into its own module. Removed direct dependency on tokeniser.
  • 0.0.4 Moved tokeniser into its own module.
  • 0.0.3 Added stop words removal feature.
  • 0.0.2 Improved, added testing.
  • 0.0.1 Release.

Usage

Frequency (wf.freq(text, noStopWords, shouldStem))

Returns an object containing the frequency of terms in the text provided.

  • text is the string (text document) in which the calculations are to be performed on.
  • noStopWords defaults to true. Set to false if you want to include stop words–e.g words such as "I" and "the".
  • shouldStem defaults to true. Set to false if you want words not to be stemmed.
var str = "@waltercfilho tweeted about houses: housing is the most expensive thing ever f#!*";
 
var frequency = wf.freq(str); // shouldStem -> `true`
>> {
      "waltercfilho" : 1,
      "tweet" : 1,
      "hous" : 2,
      "expens" : 1
    }

Tokenising (wf.tokenise(text, noStopWords))

Simply returns an array of terms, without punctuation.

  • text is the string (text document) in which the calculations are to be performed on.
  • noStopWords defaults to true. Set to false if you want to include stop words–e.g words such as "I" and "the".
var wf = require('word-freq');
 
var str = "you're simply a test, a mere test";
var tokenised = wf.tokenise(str);
>> ['simply', 'test', 'mere', 'test']
 

Stemming (wf.stem(text, noStopWords))

Returns an array of terms, stemmed and without punctuation.

  • text is the string (text document) in which the calculations are to be performed on.
  • noStopWords defaults to true. Set to false if you want to include stop words–e.g words such as "I" and "the".

Note: This is basically a wrapper around the stem-porter library by kastor.

var wf = require('word-freq');
 
var str = "you're simply a simplistic house, made for housing";
var tokenised = wf.stem(str);
>> ["simpli", "simplist", "hous", "hous"]

Package Sidebar

Install

npm i word-freq

Weekly Downloads

25

Version

0.0.8

License

MIT

Last publish

Collaborators

  • waltercarvalho