couch-worker
This module is used to provide shared functionality between CouchDB workers. It manages the connection to CouchDB and listening for changes, as well as posting those changes back to Couch and logging updates.
Defining a worker
var createWorker = createWorker; moduleexports = ;
ignored(doc)
This should be a predicate which returns true if the document is ignored by the worker, false otherwise. You might want to restrict the worker to operating on a specific document type, and exclude design docs for example.
Important: This function must be self-contained and not use surrounding scope so that it's suitable for converting to a string and sending to couchdb. That means no node-specific code or referencing things outside of the function body.
migrated(doc)
This should be a predicate which returns true if the doc has already been
migrated, false otherwise. All documents returned from the migrate()
function must pass this predicate.
Important: This function must be self-contained and not use surrounding scope so that it's suitable for converting to a string and sending to couchdb. That means no node-specific code or referencing things outside of the function body.
migrate(doc, callback)
This is the migration function which can cause whatever effects may be
required to update the document then passes the updated document back to
the callback. You can return multiple documents in an array if you like,
but you must return the original document as one of them (modified so
that it passes the migrated()
predicate).
This function will always be called from Node.js, so you can use surrounding scope in the module and require other Node modules.
Starting a worker
// require the worker definition (see above section)var myworker = ; var config = name: 'My Worker' database: 'http://admin:password@localhost:5984/database' log_database: 'http://admin:password@localhost:5984/workers' concurrency: 4; // start the workervar w = myworkerstartconfig; // stop the workerw;
Common configuration options
Your worker can use additional configuration properties as required (for
API keys etc), but all workers using couch-worker
have the following
options available.
- name (required) - String - The unique name for this worker instance
- database (required) String - The database URL (with credentials) to migrate documents in
- log_database (required) - String - The database URL (with credentials) to store worker-related data (error logs, priority queues etc)
- concurrency - Number - Maximum number of documents to process in parallel
- timeout - Number Time to wait in milliseconds for
migrate()
calls to return before causing a timeout error and discarding any future result - checkpoint_size - Number - The number of documents to process before recording a checkpoint (the sequence id the worker will resume processing from on a restart)
- retry_attempts - Number - Number of times to retry a
migrate()
when an error is returned before recording the error in the log_database and moving onto the next change - retry_interval - Number - Number of milliseconds to wait before retrying
- bucket - Object - An object with
start
and/orend
properties. This causes the worker to hash all document IDs using md5 to put them into fair buckets. The worker will only process the document if the hex digest of the md5 hash is greater than or equal tostart
and less thanend
. All other documents will be ignored. This allows you to run multiple instances of the same worker to split up processing of documents. Start and end properties should be Strings in the hex range ('0000...' to 'ffff..'). Omitting the start property means "process everything up until 'end'", omitting the end property means "process everything from 'start' onwards".
Viewing worker progress / logs
You can push the couch-worker-dashboard
CouchApp to the log_database
db
to get a web interface to workers using this module. This includes progress
information, error logs and the ability to process specific documents
on demand.