webcheck-robots
TypeScript icon, indicating that this package has built-in type declarations

0.1.0 • Public • Published

webcheck-robots

A plugin for Robots Exclusion Standard for webcheck

How to install

npm install --save webcheck-robots

How to use

var Webcheck = require('webcheck');
var RobotsPlugin = require('webcheck-robots');

var plugin = RobotsPlugin();

var webcheck = new Webcheck();
webcheck.addPlugin(plugin);

plugin.enable();

// now continue with your code...

Options

  • filterUrl: Filter urls that should only crawled once (default all urls).
  • userAgent: User Agent for robots.txt (defaults to webcheck user agent).
  • sitemapLookup: Should the plugin crawl the sitemap if there is a information in robots.txt (default: true).
  • respectDelay: Should the plugin respect the delay automatically (default: true).

Note for filters

Filters are regular expressions, but the plugin uses only the .test(str) method to proof. You are able to write your own and much complexer functions by writing the logic in the test method of an object like this:

opts = {
   filterSomething: {
       test: function (val) {
           return false || true;
       }
   }
}

Properties

  • hosts: Object of robots.txt information sorted by host.
  • userAgent: User Agent string to identify corresponding settings in robots.txt.

Dependencies (2)

Dev Dependencies (2)

Package Sidebar

Install

npm i webcheck-robots

Weekly Downloads

0

Version

0.1.0

License

ISC

Last publish

Collaborators

  • atd