scrappy
TypeScript icon, indicating that this package has built-in type declarations

0.6.0 • Public • Published

Scrappy

NPM version NPM downloads Build status Test coverage

Extract rich metadata from URLs.

Try it using Runkit!

Installation

npm install scrappy --save

Usage

Scrappy attempts to parse and extract rich structured metadata from URLs.

import { scraper, urlScraper } from "scrappy";
import * as plugins from "scrappy/dist/plugins";

Scraper

Accepts a request function and a list of plugins to use. The request is expected to return a "page" object, which is the same shape as the input to scrape(page).

const scrape = scraper({
  request,
  plugins: [plugins.htmlmetaparser, plugins.exifdata],
});

const res = await fetch("http://example.com"); // E.g. `popsicle`.

await scrape({
  url: res.url,
  status: res.status,
  headers: res.headers.asObject(),
  body: res.stream(), // Must stream the request instead of buffering to support large responses.
});

URL Scraper

Simpler wrapper around scraper that automatically makes a request(url) for the page.

const scrape = urlScraper({ request });

await scrape("http://example.com");

License

Apache 2.0

Dependents (1)

Package Sidebar

Install

npm i scrappy

Weekly Downloads

17

Version

0.6.0

License

Apache-2.0

Unpacked Size

140 kB

Total Files

24

Last publish

Collaborators

  • blakeembrey