flexible

Easily build flexible, scalable, and distributed, web crawlers.

Flexible Web-Crawler Module

Easily build flexible, scalable, and distributed, web crawlers for Node.js.

var flexible = require('flexible');
 
// Initiate a crawler. Chainable. 
var crawl = flexible.crawl('http://www.example.com');
crawl
    .use(flexible.querystring)
    .use(flexible.router)
 
    .route('/users/:name', function (reqresbodyqueue_item) {
        crawl.navigate('http://www.example.com/search?q=' + req.params.name);
    })
    .route('/search', function (reqresbodyqueue_item) {
        console.log('Search document handled for:', req.params.q);
    })
    .route('*', function (reqresbodyqueue_item) {
        console.log('Every document is handled by this route.');
    })
 
    .on('complete', function () {console.log('Finished!');})
    .on('error', function (error) {console.error(error);});
  • Asynchronous friendly, and evented, API for building flexible, scalable, and distributed web crawlers.
  • An array based queue for small crawls, and a fully SQLite based queue for quickly crawling billions of pages.
  • Middleware system; includes router middleware (wildcards, placeholders, etc), and querystring middleware.
npm install flexible

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.