Search results

162 packages found

A set of shared utilities that can be used by crawlers

published version 3.12.1, 2 months ago16 dependents licensed under $Apache-2.0
91,570

A library to recursively retrieve and serialize Notion pages with customization for machine learning applications.

published version 1.0.0, a year ago7 dependents licensed under $MIT
74,717

Dependency free module for scraping and crawling websites using [Crawlbase](https://crawlbase.com) API

published version 1.0.2, 7 months ago0 dependents licensed under $Apache-2.0
2,674

Web crawler for Node.js

published version 0.3.21, 7 years ago9 dependents licensed under $MIT
1,698

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on demand.

published version 1.0.17, a month ago0 dependents licensed under $MIT
1,833

Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously

published version 0.8.0, 8 years ago6 dependents licensed under $ISC
716

Node SDK for Hyperbrowser API

published version 0.19.0, 7 days ago0 dependents licensed under $MIT
894

Paginator enriches ability to paginate over the pages in Goose Parser

published version 1.0.2, 7 years ago0 dependents licensed under $SEE LICENSE IN LICENSE
634

A web crawler for Nodejs.

published version 0.8.2, 10 years ago1 dependents licensed under $MIT
531

Crawler (spider) of site web pages by domain name

published version 1.2.3, 3 years ago0 dependents licensed under $MIT
517

Distributed web crawler powered by Headless Chrome

published version 1.8.0, 7 years ago7 dependents licensed under $MIT
547

Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.

published version 2.2.3, a year ago0 dependents licensed under $MIT
423

Priority based Semantic Web Crawler.

published version 0.0.2, 7 years ago0 dependents licensed under $MIT
357

An `URL` parser for crawling purpose.

published version 2.0.5, 7 years ago0 dependents licensed under $MIT
348

Real transparent HTTP-Proxy-Server. Upstream your requests whatever you want!

published version 1.15.3, 8 months ago1 dependents licensed under $ISC
261

A damn simple tool to extract json-ld metadata from webpage using jquery like api (jQuery, Cheerio, CashDOM, ...).

published version 0.0.8, 3 years ago2 dependents licensed under $MIT
284

Collects torrents from various sources (dump, RSS, HTML pages) and associates the video files within with IMDB ID

published version 0.8.6, 9 years ago1 dependents licensed under $MIT
208

JS client for WecrawlerAPI

published version 1.0.7, 4 days ago0 dependents licensed under $MIT
177

Sample website text content over time.

published version 4.0.5, 8 years ago0 dependents licensed under $MIT
161

simple polite crawling of the web.

published version 5.1.2, 9 years ago0 dependents licensed under $MIT
147