easy_web_crawler

    1.0.6 • Public • Published

    easy_web_crawler Gitter chat

    Web crawler around puppeteer to crawler ajax/java script enabled pages.Check out example folder for how to use

    Features!

    • Support crawling of javascript/ajax pages
    • url filter
    • avoid duplicate urls
    • delay before page load
    • custom data extraction
    • build in spider
    • stop and resume the crawling
    • fast image download

    Documentation

    Read full documentation here

    USAGE

    var Scraper = require("easy_web_crawler")
    
    async function main() {
    
        var scraper = new Scraper();
        scraper.startWithURLs("start_url")
        scraper.allowIfMatches(function (url) { <<some true false logic here>> })
        scraper.enableAutoCrawler(true)
        scraper.saveProgressInFile("hello.db")
        scraper.waitBetweenPageLoad(0)
        scraper.callbackOnPageLoad(async function (page) {
            <<logic here>>
        });
        scraper.callbackOnFinish(function (result) {
            console.log(JSON.stringify(result,null,4))
        })
        await scraper.start()
    }
    
    main()
    
    

    License

    MIT

    Install

    npm i easy_web_crawler

    DownloadsWeekly Downloads

    0

    Version

    1.0.6

    License

    MIT

    Unpacked Size

    1.6 MB

    Total Files

    35

    Last publish

    Collaborators

    • vivek13186