Share your code. npm Orgs help your team discover, share, and reuse code. Create a free org »

    scrapebppublic

    This package has been deprecated

    Author message:

    with modern library, there's not need for any wrapper code

    node-scrapebp

    npm version npm downloads dependency status

    Installation

    npm install scrapebp

    This module can be forked or depended upon for future scraping projects.
    Caller only need to specify opts and implement the custom scraper and scrape callback function.

    Usage

    See bin/scrapebp for details.

    var ScrapeBp = require('scrapebp');
     
    // DemoScraper and scrapeCallback are defined
    // opts for ScrapeBp is prepared
     
    var scrapebp = ScrapeBp(opts);
     
    scrapebp.on('headers', function (headers) {
      console.log("- %s headers ready", opts.method);
      if (argv.dumpHeader) {
        console.log(headers);
      }
    });
     
    scrapebp.on('redirect', function (url, remaining) {
      console.log("- redirects to: %s (%d remaining)", url, remaining);
    });
     
    scrapebp.on('error', function (err) {
      console.error(err);
    });
     
    scrapebp.on('$ready', function(url, $) {
      console.log("- $ ready");
      // $ is the cheerio object
      // use $.html() to get the response body
      // useful if the response is not html/xml
     
      if (argv.dumpBody) {
        console.log("body:");
        console.log($.html());
      }
     
      // invoke our scraper
      DemoScraper.scrape(url, $, scrapeCallback);
    });

    Debug

    Following needle, scrapebp uses visionmedia/debug.

    DEBUG=scrapebp bin/scrapebp www.yahoo.com

    Design choice

    Originally hyperquest, hyperdirect and hyperzip is used as the HTTP stack. Then I switched to tomas/needle, which supports all of the above and iconv conversion.

    Reference for dependencies

    cheeriojs/cheerio

    tomas/needle

    TODO

    write tests that covers:

    • GET with query string
    • POST with payload
    • redirects
    • use of compression (-z and check response header and decoded body)
    • error handling

    features:

    • character set detection with aadsm/jschardet? (in case HTTP header and HTML meta did not signals charset)
    • promisify?
    • browserify

    bug:

    • multi-byte cut-off (https://github.com/tomas/needle/issues/88)

    install

    npm i scrapebp

    Downloadsweekly downloads

    9

    version

    0.5.0

    license

    MIT

    repository

    githubgithub

    last publish

    collaborators

    • avatar