Nutmeg Plundering Muse

    sitemapper_mos
    TypeScript icon, indicating that this package has built-in type declarations

    3.2.9 • Public • Published

    Sitemap-parser

    Build Status Monthly Downloads npm version GitHub license Inline docs GitHub Release Date Codecov Libraries.io dependency status for latest release LGTM Alerts LGTM Grade Test

    Parse through a sitemaps xml to get all the urls for your crawler.

    This repository is useful if you will need to manage the property rejectUnauthorized that is broken on the original project.

    Version 2

    Installation

    npm install sitemapper_mos --save

    Simple Example

    const Sitemapper = require('sitemapper_mos');
    
    const sitemap = new Sitemapper();
    
    sitemap.fetch('https://wp.seantburke.com/sitemap.xml').then(function(sites) {
      console.log(sites);
    });

    Examples in ES6

    import Sitemapper from 'sitemapper_mos';
    
    (async () => {
      const Google = new Sitemapper({
        url: 'https://www.google.com/work/sitemap.xml',
        timeout: 15000, // 15 seconds
      });
    
      try {
        const { sites } = await Google.fetch();
        console.log(sites);
      } catch (error) {
        console.log(error);
      }
    })();
    
    // or
    
    const sitemapper = new Sitemapper();
    sitemapper.timeout = 5000;
    
    sitemapper.fetch('https://wp.seantburke.com/sitemap.xml')
      .then(({ url, sites }) => console.log(`url:${url}`, 'sites:', sites))
      .catch(error => console.log(error));

    Options

    You can add options on the initial Sitemapper object when instantiating it.

    • requestHeaders: (Object) - Additional Request Headers (e.g. User-Agent)
    • timeout: (Number) - Maximum timeout in ms for a single URL. Default: 15000 (15 seconds)
    • url: (String) - Sitemap URL to crawl
    • debug: (Boolean) - Enables/Disables debug console logging. Default: False
    • concurrency: (Number) - Sets the maximum number of concurrent sitemap crawling threads. Default: 10
    • retries: (Number) - Sets the maximum number of retries to attempt in case of an error response (e.g. 404 or Timeout). Default: 0
    • rejectUnauthorized: (Boolean) - If true, it will throw on invalid certificates, such as expired or self-signed ones. Default: True
    const sitemapper = new Sitemapper({
      url: 'https://art-works.community/sitemap.xml',
      rejectUnauthorized: true,
      timeout: 15000,
      requestHeaders: {
        'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'
      }
    });

    An example using all available options:

    const sitemapper = new Sitemapper({
      url: 'https://art-works.community/sitemap.xml',
      timeout: 15000,
      requestHeaders: {
        'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'
      },
      debug: true,
      concurrency: 2,
      retries: 1,
    });

    Examples in ES5

    var Sitemapper = require('sitemapper');
    
    var Google = new Sitemapper({
      url: 'https://www.google.com/work/sitemap.xml',
      timeout: 15000 //15 seconds
    });
    
    Google.fetch()
      .then(function (data) {
        console.log(data);
      })
      .catch(function (error) {
        console.log(error);
      });
    
    
    // or
    
    
    var sitemapper = new Sitemapper();
    
    sitemapper.timeout = 5000;
    sitemapper.fetch('https://wp.seantburke.com/sitemap.xml')
      .then(function (data) {
        console.log(data);
      })
      .catch(function (error) {
        console.log(error);
      });

    Version 1

    npm install sitemapper@1.1.1 --save

    Simple Example

    var Sitemapper = require('sitemapper');
    
    var sitemapper = new Sitemapper();
    
    sitemapper.getSites('https://wp.seantburke.com/sitemap.xml', function(err, sites) {
        if (!err) {
         console.log(sites);
        }
    });

    Install

    npm i sitemapper_mos

    DownloadsWeekly Downloads

    244

    Version

    3.2.9

    License

    MIT

    Unpacked Size

    79.1 kB

    Total Files

    15

    Last publish

    Collaborators

    • zijua