krake

simple base library for crawl jobs, based on osmosis.
it crawls a website recursively and emits events to take custom actions.
it reports broken links (of static pages)
it can be used to create a search index of a static website: example

why

making crawling jobs easier and more robust...

how

install

npm install --save krake

use

var Crawler = require('krake')
var crawler = new Crawler()
crawler
  .on('page', function (pageData) {
    console.log('page', pageData)
  })
  .on('link', function (linkData) {
    console.log('link', linkData)
  })
  .on('error', function (err, pageData, linkData) {
    console.log('error', err.errorType, err.errorMessage)
  })
  .on('done', function (err) {
    if (err) console.log('broken links', err.brokenLinks)
  })
  .crawl('http://localhost:8080/')

options

these are the default options:

var crawler = new Crawler({

  // osmosis options: http://rchipka.github.io/node-osmosis/Osmosis.html
  osmosis: {
    ignore_http_errors: false,
    tries: 1
  },

  // krake options
  uri: 'http://localhost:8080',
  followExternalLinks: false,
  timeout: 500,

  pageDataSelectors: {
    title: 'head title',
    body: 'body'
  },
  linkSelectors: {
    url: '@href,@src'
  },
  linkTags: 'a,img,svg',
  linkIgnores: ':starts-with(javascript)'

})

author

Andi Neck | @andineck | andi.neck@intesso.com | intesso

license

MIT

krake

krake

why

how

options

author

license

style

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

Weekly Downloads

Version

License

Last publish

Collaborators

krake

krake

why

how

options

author

license

style

Readme

Keywords

Package Sidebar

Install

Repository

Homepage

DownloadsWeekly Downloads

Version

License

Last publish

Collaborators

Weekly Downloads