Wondering what’s next for npm?Check out our public roadmap! »

    great-reaper

    0.5.3 • Public • Published

    Great reaper

    great-reaper is targeted to scrap collections of data from web pages with usage of friendly jquery-like (css) selectors for describing scrap strategy.

    Installation

    npm install great-reaper
    

    Examples

    Get top 3 hacker news:

    reap('https://news.ycombinator.com/')
        .group('table tr:nth-child(3) table tr')
        .map({
            title: '.title a',
            url: '.title a@href'
        })
        .limit(3)
        .then(console.log);

    results

    [ { title: 'Engineer Anti-Patterns',
        url: 'http://dtrace.org/blogs/eschrock/2012/08/14/engineer-anti-patterns/' },
      { title: 'Hotel Wi-Fi blocking: Marriott is bad, and should feel bad',
        url: 'http://www.economist.com/blogs/gulliver/2015/01/hotel-wi-fi-blocking' },
      { title: 'Can\'t you just turn up the volume?',
        url: 'https://medium.com/@Amp/cant-you-just-turn-up-the-volume-4ecb7fc422a' } ]

    Use transforms

    Get hot questions from stackoverflow with urls.

    Initially question links are relative so we should make them absolute to get correct urls.

    reap('http://stackoverflow.com/?tab=hot')
        .group('.question-summary')
        .map({
            question: '.question-hyperlink',
            url: '.question-hyperlink@href',
            views: '.views .mini-counts'
        })
        .transform({
            question: reap.t().lowercase(),
            url: reap.t.().prefix('http://stackoverflow.com'),
            views: reap.t.().int()
        })
        .then(console.log);

    results

    [ { question: 'program breaks from switch java',
        url: 'http://stackoverflow.com/questions/27840619/program-breaks-from-switch-java',
        views: 49 },
      { question: 'what is the z at the end of date',
        url: 'http://stackoverflow.com/questions/27840670/what-is-the-z-at-the-end-of-date',
        views: 28 },
      { question: 'convert array of objects into object',
        url: 'http://stackoverflow.com/questions/27840109/convert-array-of-objects-into-object',
        views: 18 }, .... ]

    Also you can chain transforms

        ...
        .transform({
            summary: reap.t().lowercase().trim()
        })
        ...

    Transforms

    reap.transforms contains basic transforms functions

    reap.t().tream()

    Tream field value

    reap.t().prefix(string)

    Prepend string to field value

    reap.t().postfix(string)

    Append string to field value

    reap.t().lowercase()

    Lowercase field value

    reap.t().slice(from, to)

    Slices field value same as string.slice

    reap.t().split(separator)

    Split string using given separator and returns array

    reap.t().join(glue)

    Joins array using given glue and returns string

    reap.t().int()

    Typecase field value to int

    reap.t().float()

    Typecase field value to float

    Custom transforms

    You can use custom transform function:

        ...
        .transform(function (item) {
            if (item.type === 'good') {
                item.status = 'good item';
            }
     
            return item;
        })
        ...

    Or apply transform for specific field

        ...
        .transform({
            status: function (val) {
                return 'status: ' + val.toLowerCase();
            }
        })
        ...

    Filter results

    Filters allows you to filter out redundant items from collection

        ...
        .filter(function (item) {
            return item.type === 'good';
        })
        ...

    property specific filters:

        ...
        .filter({
            type: function (type) {
                return type === 'good';
            }
        })
        ...

    LICENSE

    MIT

    Install

    npm i great-reaper

    DownloadsWeekly Downloads

    6

    Version

    0.5.3

    License

    MIT

    Last publish

    Collaborators

    • avatar