Nanoprogrammed Penultimate Musicianship

    webcheck-robots
    TypeScript icon, indicating that this package has built-in type declarations

    0.1.0 • Public • Published

    webcheck-robots

    A plugin for Robots Exclusion Standard for webcheck

    How to install

    npm install --save webcheck-robots

    How to use

    var Webcheck = require('webcheck');
    var RobotsPlugin = require('webcheck-robots');
     
    var plugin = RobotsPlugin();
     
    var webcheck = new Webcheck();
    webcheck.addPlugin(plugin);
     
    plugin.enable();
     
    // now continue with your code...
     

    Options

    • filterUrl: Filter urls that should only crawled once (default all urls).
    • userAgent: User Agent for robots.txt (defaults to webcheck user agent).
    • sitemapLookup: Should the plugin crawl the sitemap if there is a information in robots.txt (default: true).
    • respectDelay: Should the plugin respect the delay automatically (default: true).

    Note for filters

    Filters are regular expressions, but the plugin uses only the .test(str) method to proof. You are able to write your own and much complexer functions by writing the logic in the test method of an object like this:

    opts = {
       filterSomething: {
           test: function (val) {
               return false || true;
           }
       }
    }

    Properties

    • hosts: Object of robots.txt information sorted by host.
    • userAgent: User Agent string to identify corresponding settings in robots.txt.

    Install

    npm i webcheck-robots

    DownloadsWeekly Downloads

    0

    Version

    0.1.0

    License

    ISC

    Last publish

    Collaborators

    • avatar