Scrapyard makes scraping websites easy. I'ts a wrapper for most the things you need, comes with optional caching and retries, and opens as many connections as you like.
npm install scrapyard
var scrapyard = ;var scraper =debug: trueretries: 5connections: 10cache: './storage'bestbefore: "5min";
retriesnumber of times the scraper attempts to fetch the url before giving up. default: 5
connectionsnumber of concurrent connections a scraper will make. setting this too high could be considered as a ddos so be polite and keep this reasonable
cacheis a folder, where scraped contents are cached. by default caching is off.
bestbeforetime your cache is valid, either an int of milliseconds or a string, valid forever when 0
The first argument can be either a
url string or an
url is the only option required.
urlis a string containing the HTTP URL
methodis the HTTP method (default:
formis an object containing your formdata
encodingis passed to
callback(err, data)is the callback method
Although scrapyard has only been tested with these 6 options, you can try to set any option available for request.
It's possible to use scrapyard with tor using the
var scrapyard = ;var scraper = ;var Agent = ;;