Itemize

A lazy, fluent web crawler with an async/await API.

$ yarn add itemize

Quickstart

Itemize lists all of the linked files and pages underneath the specified root URL.

const urls = itemize('https://news.ycombinator.com', { depth: 2 })
 
// Get a quick Hacker News sitemap
while (!urls.done()) {
  console.log(await urls.next())
}

This is useful for writing mirrors, monitoring a page for new content, etc. It starts at the root URL provided and automatically spiders through to find connecting pages. Itemize takes a lazy approach to I/O and only makes requests when you're asking it for more content with next().

API

itemize(url, options)

Returns an Itemize instance.

url: String, the root URL from which to crawl
options: Object
- depth: Number, crawl this many layers deep (0)

const items = itemize('https://nodejs.org/download/release/', { depth: 1 })

.next()

Returns a Promise for a String, the next linked URL.

If no urls remain, returns a Promise for undefined.

const url = await items.next()

.done()

Returns a Boolean representing whether or not all spidering routes have been exhausted.

if (items.done()) console.log('crawl complete')

.all()

Returns a Promise for an Array of Strings, all of the previously traversed items.

const all = await items.all()

.close()

Itemize uses a keepalive HTTP/HTTPS agent. Use close() to destroy the existing underlying socket and create a new Agent with no existing connections.

You should use this to clean up after Itemize instances that haven't completed their crawls.

items.close()

Tests and Examples

$ yarn test

$ node --harmony examples/hackernews.js
$ node --harmony examples/nodes.js

itemize

Itemize

Quickstart

API

itemize(url, options)

.next()

.done()

.all()

.close()

Tests and Examples

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

Weekly Downloads

Version

License

Last publish

Collaborators

itemize

Itemize

Quickstart

API

itemize(url, options)

.next()

.done()

.all()

.close()

Tests and Examples

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

License

Last publish

Collaborators

Weekly Downloads