partsley
TypeScript icon, indicating that this package has built-in type declarations

3.0.1 • Public • Published

partsley

- A tool for parsing the web

NPM Version

Usage

Install from npm/yarn

$ npm install partsley

Use a "parselet" as a recipe/filter to parse a website.

Parselets are just plain JS objects, so can be serialized using e.g. YAML or JSON. Examples here are shown in YAML for brevity.

Here is an example of a parselet for grabbing business data from a Yelp page:

name: h1
phone: .biz-phone
address: address
reviews(.review):
date: meta[itemprop=datePublished] @content
  name: .user-name a
  comment: .review-content p

As a module

You can also use partsley as a module:

import { partsley } from 'parsz';
 
const opts = {};
const data = partsley(html, parselet, opts);

Tips

This is a very general purpose and flexible tool. But here are some tips for getting started.

Grabbing a list of data

Use a reference selector in the key and an Array as the value.

users(.user):
name: .name
  age: .age

Use transformation functions on data

Add a pipe (|) and the transformation name after the data selector.

user:
  name: .name
  age: .age|parseInt
  worth: .age|parseFloat
  someNumber: .age|Math.floor

By default functions in scope include any standard library functions. However, you're encouraged to bring your own functions into scope. You may consider e.g. curried libs like Ramda or Lodash FP, such as to expose transforms like toLower and split(','):

import { partsley } from 'parsz';
import * as R from 'ramda';
 
const opts = {
  transforms: R,
};
const data = partsley(html, parselet, opts);

Grabbing an attribute

Use a (@) symbol to reference an attribute.

user:
  name: .name
  nickname: .name@data-nickname

Have fun!

Related projects

Package Sidebar

Install

npm i partsley

Weekly Downloads

2

Version

3.0.1

License

MIT

Last publish

Collaborators

  • tycho01