krake

1.0.2 • Public • Published

krake

  • simple base library for crawl jobs, based on osmosis.
  • it crawls a website recursively and emits events to take custom actions.
  • it reports broken links (of static pages)
  • it can be used to create a search index of a static website: example

why

making crawling jobs easier and more robust...

how

install

npm install --save krake

use

var Crawler = require('krake')
var crawler = new Crawler()
crawler
  .on('page', function (pageData) {
    console.log('page', pageData)
  })
  .on('link', function (linkData) {
    console.log('link', linkData)
  })
  .on('error', function (err, pageData, linkData) {
    console.log('error', err.errorType, err.errorMessage)
  })
  .on('done', function (err) {
    if (err) console.log('broken links', err.brokenLinks)
  })
  .crawl('http://localhost:8080/')

see also example

options

these are the default options:

var crawler = new Crawler({

  // osmosis options: http://rchipka.github.io/node-osmosis/Osmosis.html
  osmosis: {
    ignore_http_errors: false,
    tries: 1
  },

  // krake options
  uri: 'http://localhost:8080',
  followExternalLinks: false,
  timeout: 500,

  pageDataSelectors: {
    title: 'head title',
    body: 'body'
  },
  linkSelectors: {
    url: '@href,@src'
  },
  linkTags: 'a,img,svg',
  linkIgnores: ':starts-with(javascript)'

})

author

Andi Neck | @andineck | andi.neck@intesso.com | intesso

license

MIT

style

js-standard-style

Package Sidebar

Install

npm i krake

Weekly Downloads

1

Version

1.0.2

License

MIT

Last publish

Collaborators

  • andineck