easy_web_crawler

1.0.6 • Public • Published

easy_web_crawler Gitter chat

Web crawler around puppeteer to crawler ajax/java script enabled pages.Check out example folder for how to use

Features!

  • Support crawling of javascript/ajax pages
  • url filter
  • avoid duplicate urls
  • delay before page load
  • custom data extraction
  • build in spider
  • stop and resume the crawling
  • fast image download

Documentation

Read full documentation here

USAGE

var Scraper = require("easy_web_crawler")

async function main() {

    var scraper = new Scraper();
    scraper.startWithURLs("start_url")
    scraper.allowIfMatches(function (url) { <<some true false logic here>> })
    scraper.enableAutoCrawler(true)
    scraper.saveProgressInFile("hello.db")
    scraper.waitBetweenPageLoad(0)
    scraper.callbackOnPageLoad(async function (page) {
        <<logic here>>
    });
    scraper.callbackOnFinish(function (result) {
        console.log(JSON.stringify(result,null,4))
    })
    await scraper.start()
}

main()

License

MIT

Dependencies (4)

Dev Dependencies (0)

    Package Sidebar

    Install

    npm i easy_web_crawler

    Weekly Downloads

    1

    Version

    1.0.6

    License

    MIT

    Unpacked Size

    1.6 MB

    Total Files

    35

    Last publish

    Collaborators

    • vivek13186