node-scrapebp
Installation
npm install scrapebp
This module can be forked or depended upon for future scraping projects.
Caller only need to specify opts
and implement the custom scraper and scrape callback function.
Usage
See bin/scrapebp for details.
var ScrapeBp = ; // DemoScraper and scrapeCallback are defined// opts for ScrapeBp is prepared var scrapebp = ; scrapebp; scrapebp; scrapebp; scrapebp;
Debug
Following needle
, scrapebp
uses visionmedia/debug.
DEBUG=scrapebp bin/scrapebp www.yahoo.com
Design choice
Originally hyperquest, hyperdirect and hyperzip is used as the HTTP stack.
Then I switched to tomas/needle, which supports all of the above and iconv
conversion.
Reference for dependencies
TODO
write tests that covers:
- GET with query string
- POST with payload
- redirects
- use of compression (
-z
and check response header and decoded body) - error handling
features:
- character set detection with aadsm/jschardet? (in case HTTP header and HTML meta did not signals charset)
- promisify?
- browserify
bug:
- multi-byte cut-off (https://github.com/tomas/needle/issues/88)