timothy: Node.js library for writing Hadoop MapReduce jobs in JS
Timothy's primary goal is to make Hadoop's Yellow Elephant rich and famous.
npm install timothy
// require timothy// basic configuration for the job: hadoop conf, input, output, name, etc// map function: one (line) or two (key, value) arguments// reduce function: two arguments (key, value)// run function, creates the job, uploads it and blocks until// the execution has finished;
Testing in the local machine
// runLocal can be used instead of run to simulate the job execution// from the command line;
Initialising a job
// global variables and functions will be available in the map and reduce functions;
Passing Environment Variables
Using node libraries
// Libraries can be added using the same syntax as// in a NPM package.json file;
Status and counters
Status and counters for the job can be updated using the this.updateStatus and this.updateCounter functions.
map, reduce and setup functions are used as templates for the job functions. Trying to use values from these function definition closures will fail when running the actual job. Use the 'cmdenv' configuration to pass values to the job instead.
At the moment, the setup function does not handle blocking asynchronous operations. If one of these operations is invoked, the script will continue executing the map/reduce function before the asynchronous callback is executed.
Forward Internet Group (c) 2012. Available under the LGPL V3 license.
firstname.lastname@example.org, email@example.com, firstname.lastname@example.org