norch-crawlers

0.0.3 • Public • Published

A NodeJS crawler library to quick and easy build versatile crawlers. Just to make working with request and cheerio a little easier and to not have to write all the standard stuff over and over again.

Functions

  • Play nice with servers: Wait between each request.
  • Get ´next´ and ´last´ URL for pagination scenario.
  • Write list syncronusly to file at the end
  • Serving header info

Examples

  • List crawling: Crawl paginated lists for URLs

Functionality to be

  • Item crawling
  • Pagination iteration, second version
  • Define which domain(s) to crawl
  • Site-crawl - Add found URLs to crawl queue
  • Write content asyncronusly (add to file) throughout crawling.
  • Follow robots.txt
  • Check if new content
  • Check if updated content
  • Overwrite crawler header and set ´from´-field.
  • Crawl with headless browser.

/norch-crawlers/

    Package Sidebar

    Install

    npm i norch-crawlers

    Weekly Downloads

    1

    Version

    0.0.3

    License

    MIT

    Unpacked Size

    7.6 kB

    Total Files

    8

    Last publish

    Collaborators

    • eklem