@web-master/node-web-crawler
TypeScript icon, indicating that this package has built-in type declarations

0.10.0 • Public • Published

😎 @web-master/node-web-crawler 😎

Crawl web as easy as possible

Package License (MIT)

Description

It crawls the target page, collects links and scrapes data on each page :)

Installation

$ npm install --save @web-master/node-web-crawler

Usage

Basic

import crawl from '@web-master/node-web-crawler';

// crawl data on each link
const data = await crawl({
  target: {
    url: 'https://news.ycombinator.com',
    iterator: {
      selector: 'span.age > a',
      convert: (x) => `https://news.ycombinator.com/${x}`,
    },
  },
  fetch: () => ({
    title: '.title > a',
  }),
});

console.log(data);
// [
//   { title: 'An easiest crawling and scraping module for NestJS' },
//   { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
//   ...
//   ...
//   { title: '[Experimental] React SSR as a view template engine' }
// ]

Waitable (by using puppeteer)

import crawl from '@web-master/node-web-crawler';

// crawl data on each link
const data = await crawl({
  target: {
    url: 'https://news.ycombinator.com',
    iterator: {
      selector: 'span.age > a',
      convert: (x) => `https://news.ycombinator.com/${x}`,
    },
  },
  waitFor: 3 * 1000, // wait for the content loaded! (like single page apps)
  fetch: () => ({
    title: '.title > a',
  }),
});

console.log(data);
// [
//   { title: 'An easiest crawling and scraping module for NestJS' },
//   { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
//   ...
//   ...
//   { title: '[Experimental] React SSR as a view template engine' }
// ]

TypeScript Support

import crawl from '@web-master/node-web-crawler';

interface HackerNewsPage {
  title: string;
}

const pages: HackerNewsPage[] = await crawl({
  target: {
    url: 'https://news.ycombinator.com',
    iterator: {
      selector: 'span.age > a',
      convert: (x) => `https://news.ycombinator.com/${x}`,
    },
  },
  fetch: () => ({
    title: '.title > a',
  }),
});

console.log(pages);
// [
//   { title: 'An easiest crawling and scraping module for NestJS' },
//   { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
//   ...
//   ...
//   { title: '[Experimental] React SSR as a view template engine' }
// ]

Related

Package Sidebar

Install

npm i @web-master/node-web-crawler

Weekly Downloads

106

Version

0.10.0

License

MIT

Unpacked Size

11.4 kB

Total Files

11

Last publish

Collaborators

  • saltyshiomix