webcheck-crawl-once

1.0.0 • Public • Published

webcheck-crawl-once

A plugin for webcheck to prevent multiple downloads of the same resource.

How to install

npm install --save webcheck-crawl-once

How to use

var Webcheck = require('webcheck');
var CrawlOncePlugin = require('webcheck-crawl-once');
 
var plugin = CrawlOncePlugin({
    // filterUrl: /.html/,
    ignoreQuery: false
});
 
var webcheck = new Webcheck();
webcheck.addPlugin(plugin);
 
plugin.enable();
 
// now continue with your code...
 

Options

  • filterUrl: Filter urls that should only crawled once (default all urls).
  • ignoreQuery: Ignore query in url.

Note for filters

Filters are regular expressions, but the plugin uses only the .test(str) method to proof. You are able to write your own and much complexer functions by writing the logic in the test method of an object like this:

opts = {
   filterSomething: {
       test: function (val) {
           return false || true;
       }
   }
}

Methods

  • reset(undefined | url): Reset a specific url, or the complete ignore list
  • ignore(url): Add a resource to ignore list
  • check(url): Check if a resource is ignored

Package Sidebar

Install

npm i webcheck-crawl-once

Weekly Downloads

0

Version

1.0.0

License

ISC

Unpacked Size

36.4 kB

Total Files

15

Last publish

Collaborators

  • atd