Necesito Programar Más

    @fabrix/spool-scraper
    TypeScript icon, indicating that this package has built-in type declarations

    1.5.1 • Public • Published

    spool-scraper

    Gitter NPM version Build Status Test Coverage Dependency Status Follow @FabrixApp on Twitter

    📦 Scraper Spool

    A Spool to make Scraping the web super easy by implementing Crawler.

    Install

    $ npm install --save @fabrix/spool-scraper

    Configure

    // config/main.ts
    import { ScraperSpool } from '@fabrix/spool-scraper'
    export const main = {
      spools: [
        // ... other spools
        ScraperSpool
      ]
    }

    Configuration

    // config/scraper.ts
    export const scraper = {
      max_connections: 10,
        rate_limit: 1000,
        encoding: null,
        jQuery: true,
        force_UTF8: true,
        retries: 3,
        retry_timeout: 10000,
        incoming_encoding: null,
        skip_duplicates: false,
        // Boolean If true, userAgent should be an array and rotate it (Default false)
        rotate_UA: false,
        // String|Array, If rotateUA is false, but userAgent is an array, crawler will use the first one.
        user_agent: [],
        // String If truthy sets the HTTP referer header
        referer: null,
        // Object Raw key-value of http headers
        headers: null,
        pre_request: (opts, done) => {
          // 'options' here is not the 'options' you pass to 'c.queue',
          // instead, it's the options that is going to be passed to 'request' module
          console.log(opts)
          // when done is called, the request will start
          done()
        }
    }
    

    For more information about store (type and configuration) please see the scraper documentation.

    Usage

    For the best results, create a Scrape Class and override the default process method.

      import { Scrape } from '@fabrix/spool-scraper'
      
      export class AmazonScrape extends Scrape {
        process(res): Promise<any> {
          const $ = res.$
          const amazon = $('.nav-logo-base').text()
          return Promise.resolve(amazon)
        }
      }

    Then you can either queue your scrape or scrape directly

    // Return a result immediately <see config for options>
    const direct = this.app.scrapes.AmazonScrape.direct('https://amazon.com', options, preRequest)
    
    // Add this to the queue <see config for options>
    this.app.scrapes.AmazonScrape.queue('https://amazon.com', options, preRequest)

    Install

    npm i @fabrix/spool-scraper

    Homepage

    fabrix.app

    DownloadsWeekly Downloads

    5

    Version

    1.5.1

    License

    MIT

    Unpacked Size

    19.6 kB

    Total Files

    27

    Last publish

    Collaborators

    • scottbwyatt