gretel

0.0.7 • Public • Published

Gretel

Follows and collects breadcrumbs across the web.

Heavily relies on Christopher Giffard's node-simplecrawler

Usage

CLI

gretel [options]

Options:

  -h, --help                  output usage information
  -V, --version               output the version number
  -s, --startUri [uri]        Uri to start crawling from
  -q, --queuePath [filePath]  File path to load / save queue from

Module

var gretel = require('gretel')('www.example.com');

gretel.start();

Optionally load / save breadcrumb queue state

gretel.load('./breadcrumbs.json', function(error){
    if(error){
        return console.log(error.stack || error);
    }

    gretel.start();
});

gretel.queue.freeze("./breadcrumbs.json", function(error){
    if(error){
        console.log(error.stack || error);
    }
});

Other settings on gretel are the same as node-simplecrawler (she is actually an instance of Crawler) for more info and examples see the readme for node-simplecrawler

// sync processing
gretel.on('fetchcomplete', function(queueItem, data, response) {
    console.log(queueItem.url);
});

// async processing
gretel.on("fetchcomplete", function(queueItem, data, response) {
    var continue = this.wait();
    doSomethingAsync(data, function(){
        console.log(queueItem.url);
        continue();
    });
});

Dependencies (2)

Dev Dependencies (1)

Package Sidebar

Install

npm i gretel

Weekly Downloads

7

Version

0.0.7

License

MIT

Last publish

Collaborators

  • mauricebutler