Mongo powered queue for SimpleCrawler

NOTE: This code is very early in developmnt. Please try it and let everybody know what you think (and what bugs have you found) via GitHub issues and pull requests.


It's still alpha quality software, so I haven't pushed it to NPM yet. Please install it from GitHub repo.

git clone ./simplecrawler-queue-mongo
cd simplecrawler-queue-mongo
npm install
npm run-script prepublish

There is also npm run-script develop to watch and rebuild - please try it if you would like to hack on this code.


Crawler  = require "simplecrawler"
Queue    = require "./simplecrawler-queue-mongo"
mongoose = require "mongoose"
mongoose.connect "localhost/test"
crawler       = Crawler.crawl ""  = 'radzimy-co' # You don't need this if you only run one crawler.
crawler.queue = new Queue mongoose.connections[0], crawler

which compiles to:

var Crawler, Queue, crawler, mongoose;
Crawler   = require("simplecrawler");
Queue     = require("./simplecrawler-queue-mongo");
mongoose  = require("mongoose");
crawler       = Crawler.crawl("");  = 'radzimy-co';
crawler.queue = new Queue(


ATM it relies on Mongoose connection that application provide. In the future I'd like to decouple it, so that application could provide native MongoDB connection or connection string.

If you want to use multiple crawlers with one database (eg. for crawling multiple domains) set unique name property on each crawler (like in the example). It will be used to distinguish queues in a collection.


Much welcome :)

