❤

Pro
Teams
Pricing
Documentation

npm

norch-crawlers

0.0.3 • Public • Published 6 years ago

Readme
Code Beta
3 Dependencies
1 Dependents
3 Versions

A NodeJS crawler library to quick and easy build versatile crawlers. Just to make working with request and cheerio a little easier and to not have to write all the standard stuff over and over again.

Functions

Play nice with servers: Wait between each request.
Get ´next´ and ´last´ URL for pagination scenario.
Write list syncronusly to file at the end
Serving header info

Examples

List crawling: Crawl paginated lists for URLs

Functionality to be

Item crawling
Pagination iteration, second version
Define which domain(s) to crawl
Site-crawl - Add found URLs to crawl queue
Write content asyncronusly (add to file) throughout crawling.
Follow robots.txt
Check if new content
Check if updated content
Overwrite crawler header and set ´from´-field.
Crawl with headless browser.

Dependents (1)

wikipedia-stopword-crawler

Package Sidebar

Install

npm i norch-crawlers

Repository

github.com/eklem/norch-crawlers

Homepage

github.com/eklem/norch-crawlers#readme

Weekly Downloads

1

Version

0.0.3

License

MIT

Unpacked Size

7.6 kB

Total Files

8

Last publish

6 years ago

Collaborators

Try on RunKit

Report malware

Footer

Support

Help
Advisories
Status
Contact npm

Company

About
Blog
Press

Terms & Policies

Policies
Terms of Use
Code of Conduct
Privacy