gridfs-locks
gridfs-locks
implements distributed and fair read/write locking based on MongoDB, and is specifically designed to make MongoDB's GridFS file-store safe for concurrent access. It is a node.js npm package built on top of the native mongodb
driver, and is compatible with the native GridStore implementation.
NOTE: if you use gridfs-stream and need the locking capabilities of this package (and you probably do... see the "Why?" section at the bottom of this README), you should check out gridfs-locking-stream. It is basically gridfs-stream + gridfs-locks.
What's new in v1.x
Following the semantic versioning convention, version 1.x contains a few breaking changes from the prototype v0.0.x of gridfs-locks
. The main difference is that v1.x Lock and LockCollection objects are now event-emitters. There are three primary impacts of these changes:
- All async callbacks have been eliminated from the API method parameter lists and replaced with events
- A much richer set of async events (eg. lock expirations) can now be observed and handled in a more intuitive way
- Locks for removed resources can be also be removed so they don't clutter up the lock collection
Installation
Requires node.js, npm, and uses the native node.js mongo driver.
npm install gridfs-locks
To run unit tests (requires mongodb server on localhost:27017
):
npm test
Use
var Db = Db;var Server = Server;var db = 'test' '127.0.0.1' 27017;var LockCollection = LockCollectionvar Lock = Lock // Open the databasedb;
How it works (briefly)
GridFS itself creates two collections based off a root name (default root is fs
) called e.g. fs.files
and fs.chunks
. gridfs-locks
takes the same root name and creates a third collection (e.g. fs.locks
) that contains documents used to provide robust locking. Internally, it uses the MongoDB findAndModify()
operation to guarantee the atomicity of lock updates. gridfs-locks
does not touch or even know about the .files
and .chunks
collections, and so it doesn't interfere with (or even require) a GridFS store to work. As an aside, for this reason it is completely general and can be used as a distributed locking scheme for any purpose, not just for making GridFS concurrency-safe.
It uses a multiple-reader/exclusive-writer model for locking, with a fair write-request scheme to prevent blocking of writes by a continuous stream of readers. There is optional support for lock expiration, attaching metadata to locks (for debugging distributed applications), and waiting to obtain locks with timeout. When waiting for locks, the polling interval is also configurable. All of the above options can be configured globally, or on a per-lock basis. As a bonus, gridfs-locks
also tracks the number of successful read and write locks granted for each resource.
As with any locking scheme, care must be taken to avoid creating deadlocks, and the built-in lock expiration pattern may be helpful in doing so. The default configuration is that locks never expire, and attempts to obtain unavailable locks emit the 'timed-out'
event immediately without waiting for a lock to become available. These behaviors may be changed using the lockExpiration
, timeOut
and pollingInterval
options.
API
LockCollection(db, options)
Create a new lock collection.
// using 'new' is optional var lockColl = db // Must be an open mongodb connection object // Options: All except 'root' can be overridden root: 'fs' // root name for the collection. // Default: 'fs' lockExpiration: 300 // secs until a lock expires in the database // Default: 0 (Never expire) timeOut: 30 // secs to poll for an unavailable lock // Default: 0 (Do not poll) pollingInterval: 5 // secs between attempts to acquire a lock // Default: 5 sec metaData: null // any metadata to store in the lock documents // Default: null w: 1 // mongodb write-concern Default: 1 ; // Emits events: // event: 'ready' - emitted when the collection is ready to use lockColl; // event: 'error' - emitted in the case of a database or other unrecoverable// error. 'ready' will not be emitted. No listener for 'error' events will// result in throws in case of errors (node.js default behavior) lockColl;
Lock()
Create a new Lock object. Lock objects may be reused, but are tied to a single Id for their lifetime.
// using 'new' is optional lock = Id // Unique identifier for resource being locked. // Type must be compatible with mongodb `_id` lockColl // A valid LockCollection object // Options: lockExpiration: 300 // secs until a lock expires in the database // Default: 0 (Never expire) timeOut: 30 // secs to poll for an unavailable lock // Default: 0 (Do not poll) pollingInterval: 5 // secs between attempts to acquire a lock // Default: 5 sec metaData: null // any metadata to store in the lock document // Default: null ; // Emits events: // event: 'error' - emitted in the case of a database or other unrecoverable// error. No listener for 'error' events will result in throws in case of// errors, which is the node.js default behavior. lock; // event: 'locked' - A lock has been obtained. Supplies the current lock// document. See obtainReadLock() and obtainWriteLock() methods below lock; // event: 'timed-out' - A timeout occurred while waiting to obtain an// unavailable lock. This event only occurs when timeOut != 0// See obtainReadLock() and obtainWriteLock() methods below lock; // event: 'released' - A held lock was successfully released// see releaseLock() method below lock; // event: 'removed' - A held write lock was successfully removed from the// lock collection. See removeLock() method below lock; // The following three events only occur when lockExpiration != 0 // event: 'expires-soon' - warning ~90% of the lifetime of this lock has// passed. Either release or renew the lock.// See releaseLock() and renewLock() methods below lock; // event: 'renewed' - A held lock was successfully renewed// see renewLock() method below lock; // event: 'expired' - the lifetime of this lock has passed.// It is no longer safe to use the underlying resource without obtaining// a new lock lock;
lock.obtainReadLock()
Attempt to obtain a non-exclusive lock on the resource. There can be multiple simultaneous readers of a resource.
lock;
lock.obtainWriteLock()
Attempt to obtain an exclusive lock on the resource. When a write lock is obtained, there can be no other readers or writers.
lock;
lock.releaseLock()
Release a held lock, either read or write.
lock;
lock.removeLock()
Remove a held write lock from the lock collection. Appropriate to use when the write lock is obtained to delete a resource.
lock;
lock.renewLock()
Need more time? Reset the lock expiration time to lockExpiration
seconds from now.
lock;
Lock Document Data Model
files_id: Id // The id of the resource being locked expires: lockExpireTime // Date(), when this lock will expire read_locks: 0 // Number of current read locks granted write_lock: false // Is there currently a write lock granted? write_req: false // Are there one or more write requests? reads: 0 // Successful read counter writes: 0 // Successful write counter meta: null // Application metadata
Why?
I know what you're thinking:
- why does there need to be yet another locking library for node?
- why not do this using Redis, or better yet use one of the existing Redis solutions?
- wait, safe concurrent access isn't already baked into MongoDB GridFS?
I'll answer these in reverse order... GridFS is MongoDB's file store technology; really it's just a bunch of "data model" conventions making it possible to store binary blobs of arbitrarily large non-JSON data in MongoDB collections. And it's totally useful.
However, the GridFS data model says nothing about how to safely synchronize attempted concurrent read/write access to stored files. This is a problem because GridFS uses two separate collections to store file metadata and data chunks, respectively. And since MongoDB has no native support for atomic multi-operation transactions, this turns out to be a critical omission for almost any real-world use of GridFS.
The official node.js native mongo driver's GridStore library is only "safe" (won't throw errors and/or corrupt GridFS data files) under two possible scenarios:
- Once created, files are strictly read-only. After the initial write, they can never be changed or deleted.
- An application never attempts to access a file when any kind of write or delete is also in progress.
Neither of these constraints is acceptable for most real applications likely to be built with node.js using MongoDB. The solution is an efficient and robust locking mechanism to enforce condition #2 above by properly synchronizing read/write accesses. That is what this package provides.
Redis is an amazing tool and this task could be certainly be done using Redis, but in this case we are already using MongoDB and it also has the capability to get the job done, so adding an unnecessary dependency on another server technology is undesirable.
I tailored this library to use MongoDB and mirror the GridFS data model in the hopes that it may inspire the MongoDB team to add official concurrency features to a future version of the GridFS specification. In the meantime, this library will hopefully suffice in making GridFS generally useful for real world applications. I welcome all feedback.