simple-html-scraper
TypeScript icon, indicating that this package has built-in type declarations

1.0.2 • Public • Published

Simple Html Scraper

Simple Html Scraper is a small scraper using puppeteer that can be used to fetch the html content and images of a web page.

Installation

Use the package manager npm to install Simple Html Scraper.

npm i simple-html-scraper

Usage

import { Scraper } from 'simple-html-scraper';

const scraper = new Scraper(/* { options } */);

(async () => {
  const result = await scraper.get('url');
  console.log(result.content); // Html
  console.log(result.images); // Array of image urls
})();

/* options
{
  scroll?: boolean; //enable scrolling
  maxScroll?: number | 'MAX'; // scroll iterations
  scrollWait?: number; // time to wait after each scroll
  resources?: string[]; // rescources to accept during the page load
  puppeteer?: LaunchOptions; // options sent to puppeteer
}
*/

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Package Sidebar

Install

npm i simple-html-scraper

Weekly Downloads

0

Version

1.0.2

License

MIT

Unpacked Size

10.7 kB

Total Files

8

Last publish

Collaborators

  • aminekamal