punch-scraper

0.0.16 • Public • Published

punch-scraper

Config

  • proxyManagerConfig - punch proxy manager config
  • maxTry - How many time scrapper will try to fetch the link before error
  • strategy - scraper strategies
  • name - strategy name valid values: CASPERJS, HTTP, PHANTOMJS
  • proxy - proxy ip
  • lambda - for CASPERJS or PHANTOMJS only
  •  aws_key - aws key
    
  •  aws_secret - aws secret key
    
  •  region - aws region
    
  •  lambda_name - aws labmda function name
    
  • eval - code that should be evaled for CASPERJS or PHANTOMJS only
  • services
  • include - array of proxy services to use
  • exclude - array of proxy services to not use
  • valid valuesGIMMI_PROXY, HIDE_MY_ASS, IN_CLOCK, PROXY_SERVER_LIST, UK_PROXY, US_PROXY

Method

  • scrape - scrape urls
  • start - start the scraper manager
  • stop - stop the scraper manager

Usage

'use strict';
 
const ScrapeManager = require('./scraper-manager/');
const scrapeManager = new ScrapeManager();
const config = {
    eval: "response.write(page.content);response.close();",
    strategy: {
        name: 'phantomjs',
        lambda: {
            aws_key: 'XXX-XXX-XXX',
            aws_secret: 'XXX-XXX-XXX',
            lambda_name: 'node-phantomjs-aws-lambda-server-development',
            region: 'us-west-2'
        }
    }
};
 
let links = [
  'http://www.google.com/',
  'http://www.google.com/'
];
 
 
scrapeManager.start()
.then(() => scrapeManager.scrape(links, config))
.then((results) => {
    console.log(results);
    console.log('done');
    scrapeManager.stop();
});
 

Readme

Keywords

none

Package Sidebar

Install

npm i punch-scraper

Weekly Downloads

1

Version

0.0.16

License

none

Last publish

Collaborators

  • punchagency