WAT
Simply put predict next word user will write.
HOWTO
installation
git clone git@github.com:syzer/distributedNgram.git && cd $_
npm install
npm install --save-dev
The file nGram.js offers more compact version of code:
npm start
testing basic distributed task
var jsSpark = workers: 16;var task = jsSparkjsSpark;var q = jsSparkq; // this is executed on client side
tests
npm test
Tasks
clone https://github.com/syzer/distributedNgram.git
./index.js
load:
-
dracula
-
lodash
-
load helpers
(gist)
// helpers ./lib/index.js
make function prepare()
// remove special characters{}//=>"listen to them the children of the night what music they make"
(gist)
./index.js
make bigramText()
;//=>{to: {listen: 1, them:1} , listen:{to:1}, the:{children:1}}...
{ return arr;}
(gist)
./index.js
function mergeSmall()
-
create 2 tasks ch01, and ch02
-
use tasks to bigram those chapters
-
reduce response with _.merge
(gist)
./index.js
function mergeBig(texts)
-
load [ch1, ch2, ch3] or texts
-
make distinct tasks to bigram this text
-
reduce with _.mergeObjectsInArr
-
cache result
-
return result
(gist)
./index.js
function predict(word)
-
load appropriate key/word from cache
-
calc total hits
-
sort all hits in order,
may use helper function objToSortedArr(obj)
- calc frequency/probability of next word
(gist)
./index.js
function train(fileName, splitter)
-
load file
-
prepare
-
use splitter(string) to create separate tasks
-
calculate tasks on clients using mergeBig()
TODO
[ ] git checkout [ ] js-spark adventure