Provides a list of words within an entire text alongside few statistics
Getting started
- Install the package
$ npm i atext-wordz
- Require it's functions
const { getWStatsList , getWStatsObj , getWordList } = require( "atext-wordz" );
- Call it's functions
Regarding your needs you have to pick in what format you wish to get the result.
NOTE : You have 3 choicesconst result = getWStatsList( text , options ); // ==> [ {}:wordstats(1), {}, {}, ... , {}:wordstats(N) ] // OR const result = getWStatsObj( text , options ); // ==> { wordstats1, wordstats2, ..., wordstatsN } // OR const result = getWordList( text , options ); // ==> ["word 1", "word 2", ... "word N"]
Light demo
Assuming you have a demo.txt
file in a demo
folder at the same level as this .js
file and you want to get word stats.
const { fs } = require('fs') ;
const { getWStatsList , getWStatsObj , getWordList } = require( "../atext-wordz" );
fs.readFile( "./demo/demo.txt" , "utf8" , ( err , text ) => {
const sortString = ` by number of a > than b's `;
const cbOnNewWord = ( word ) => {
// TODO: make first sector actions on new word found
};
const options = { sortString , cbOnNewWord };
const result = getWStatsList( text , options );
console.log( result );
// outputs :
// < an array of word statistics sorted by most used words >
});
Options
There is few options to meet your requirements at this time. Here is the definition table.
option | type | default |
---|---|---|
sortString | string | "" |
minimumLength | number | 2 |
cbOnNewWord | function | (word:string) => {} |
-
sortString You can sort your words and stats before the service wraps everything up. Thanks to the integrated byStr~Sort npm module. You may find usefull to ceck it's sortString section.
In this instence, an example of a valid sortString could be:const sortString = ` by order of a greater than b's then by number of a < than b's `;
NOTE : Every
sort sentence
starts byby
and can be ended bythen
to chain other sort sentences -
minimumLength You can define the minimum length of words during the analysis, phase.
-
cbOnNewWord Provides you with a callback function that will be called whenever a new word is encountered. Which means, only once per word.
Stats
The services will gives you a stats matching an instance of IStatsOfWords
or IStatsOfWordsObject
or a simple array of strings
.
Here are the definitions for each of them:
IStatsOfWords
field | type | notes |
---|---|---|
word | string | the word |
order | number | the order of appearence |
number | number | the number appearence |
length | number | the word's length |
IStatsOfWordsObject
Each word will be a key
and stats will be the value
of that pair
order | number | length | |
---|---|---|---|
type | number | number | number |
Word detection
It is not that easy to detect words in a text that is quite big and containing many noises. It's not as easy as spliting on every space. And a normal text relies also on punctuation.
By chance French and English punctuation may not very this much or not at all.
Therefore, detecting anything matching anything something else than a "special" character chould be considered as part of a world. Things come very complicated when dealing with languages that are not that strict about isolating words... like japanese or chinese to list very a few.
Here is the regex that helped to detect non special characters :
const special =
/[�\d\s\\[\]\x20-\x40\-`{-~\xA0-\xBF×Ø÷øʹ͵ͺ;!?♪╚-╬┘-▀\uFF3B\uFF40\uFF5B-\uFF65¥・()]/i;