punch-scraper
Config
- proxyManagerConfig - punch proxy manager config
- maxTry - How many time scrapper will try to fetch the link before error
- strategy - scraper strategies
- name - strategy name valid values: CASPERJS, HTTP, PHANTOMJS
- proxy - proxy ip
- lambda - for CASPERJS or PHANTOMJS only
-
aws_key - aws key
-
aws_secret - aws secret key
-
region - aws region
-
lambda_name - aws labmda function name
- eval - code that should be evaled for CASPERJS or PHANTOMJS only
- services
- include - array of proxy services to use
- exclude - array of proxy services to not use
- valid valuesGIMMI_PROXY, HIDE_MY_ASS, IN_CLOCK, PROXY_SERVER_LIST, UK_PROXY, US_PROXY
Method
- scrape - scrape urls
- start - start the scraper manager
- stop - stop the scraper manager
Usage
'use strict'; const ScrapeManager = ;const scrapeManager = ;const config = eval: "response.write(page.content);response.close();" strategy: name: 'phantomjs' lambda: aws_key: 'XXX-XXX-XXX' aws_secret: 'XXX-XXX-XXX' lambda_name: 'node-phantomjs-aws-lambda-server-development' region: 'us-west-2' ; let links = 'http://www.google.com/' 'http://www.google.com/'; scrapeManagerstart;