Nacho Portmanteau Meltdown
    Share your code. npm Orgs help your team discover, share, and reuse code. Create a free org »

    scrapperspublic

    Scrappers.js

    A set of utility classes for node.js to make scrapping the web easier.

    There is support for custom browser headers, encodings and compression.

    Install

    npm install --save scrapper

    Scrapper options

    url

    The url of the target page

    parser

    An object with a public "parse" method.

    Example:
    var hnParser = {
      //$ is cheerio (jquery) instance of the parsed page
      parse:function($){
        //get the text of the third link in a page
        return $('a').eq(3).text();
      }
    };

    encoding

    The encoding of the target html page. This parameter is optional and defaults to "utf-8"

    headers

    An object containing key-value pairs of headers. Defaults to:

    {
      'User-Agent': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
    }
     

    gzip

    A flag to enable disable the gzip compressing. By default it is enabled (set to true.

    You will probably not want to disable this, if the page is not compressed, it will still be parsed correctly (see request)

    Options can be passed on instantiation:

    var scrapper = new PageScrapper({
      url: HACKER_NEWS_HOME,
      parser: hnParser
    });
     

    Or on the get request:

    scrappers.get(options, done);

    Options passed in the get request, will extend the options passed on instantiation for the duration of the request.

    Page

    A base class for scrapping a web page.

    Example:

    Get the third link from hacker news home page.

    Import scrapper object
     
    var PageScrapper = require('scrappers').PageScrapper;
     
    Write a parser

    The parse functin will rescive a cheerio instance with hn html.

    var hnParser = {
      //$ is cheerio (jquery) instance of the parsed page
      parse:function($){
        //get the text of the third link in a page
        return $('a').eq(3).text();
      }
    };
     
    Instantiate a scraper object
     
    var HACKER_NEWS_HOME = "https://news.ycombinator.com/";
    var scrapper = new PageScrapper({
      url: HACKER_NEWS_HOME,
      parser: hnParser
    });
    Parse!
     
    scrapper.get(function(err,parsed){
      console.log('Third link on hacker news page is:", parsed);
    });
     
    Result:
    Third link on hacker news page is: comments
    

    Rss

    A base class for scrapping an rss feed.

    Example:

    Get a list of article titles for ask hacker news rss.

    Import scrapper object
    var RssScrapper = require('scrappers').RssScrapper;
     
    Write a parser

    The parse functin will rescive a javascript object representing a single rss article.

    var hnParser = {
      //gets a parsed rss articale in an object
      parse:function(article){
        return article.title;
      }
    };
     
    Instantiate a scraper object
     
    var HACKER_NEWS_RSS = "http://hnrss.org/ask";
    var scrapper = new RssScrapper({
      url: HACKER_NEWS_RSS,
      parser: hnParser
    });
    Parse!
     
    scrapper.get(function(err,parsed){
      //print all articles on an rss
      console.log("Ask:Hn titles", parsed);
    });
     
    Result:
    Ask:HN titles:
    [
      'Ask HN: Do you like the idea of social network and learning?★',
      'Ask HN: How does Saved stories feature work?',
      'Ask HN: AGPL on a Code Generator App',
      'Ask HN: How do you read your programming books?',
      'Ask HN: Is OpenGL Worth Learning?',
      'Ask HN: How to produce vnc like Browserling?',
      'Ask HN: How do I solve problems/code outside of the book I used to learn python?',
      'Ask HN: Self Study Learning Path',
      'Ask HN: How to build quality software in a fast paced startup enviorment?',
      'Ask HN: Is Agar.io currently making or losing money?',
      'Ask HN: Any success with Toastmasters?',
      'Ask HN: Has anyone else found Angular to be destroying their productivity?',
      'Ask HN: How to survive a horrible tech job while looking for a new one?',
      'Ask HN: How can a successful startup adopt a strong testing workflow?',
      'Ask HN: What kind of software will be used to develop VR applications?',
      'Ask HN: How do you prepare for a Technical Interview',
      'Ask HN: Recommend one Business/Startup book',
      'Ask HN: Should I branch off my startup\'s technology into a separate company?',
      'Ask HN: Test/Play with 3D Printing Library',
      'Ask HN: What database storage engine do you use, and why?'
    ]
    
    

    Development

    To run tests use:

    npm test

    install

    npm i scrappers

    Downloadsweekly downloads

    5

    version

    2.0.0

    license

    MIT

    repository

    githubgithub

    last publish

    collaborators

    • avatar