sitemap-crawler

1.0.0 • Public • Published

Sitemap Crawler

Generate sitemap just throw any link.

Intro

sitemap-crawler collect directly accessible url through resolve with href value.

Basic Usage

const siteMap = require('sitemap-crawler');
const link = 'http://www.npmjs.com';
 
siteMap(link, (err, res) => {
  console.log('error:', err);
  console.log('siteMap:', res); // Print the siteMap from link
});

Result

[
  "https://npmjs.com/features",
  "https://npmjs.com/pricing",
  "https://npmjs.com/support",
  "https://npmjs.com/signup",
  "https://npmjs.com/signup?next=/org/create",
  "https://npmjs.com/get-npm",
  "https://npmjs.com/enterprise",
  ...
]

Plural Link

You can crawl from string array that includes link.

In this case, crawler response object type.

const siteMap = require('sitemap-crawler');
const links = [
  'http://www.npmjs.com',
  'http://github.com',
  'www.amazon.com'
]
 
siteMap(links, (err, res) => {
  console.log('error:', err);
  console.log('siteMap:', res); // Print the siteMap from link
});

Result

{
  "count": 3,
  "siteMap": {
    "http://www.npmjs.com": [...],
    "http://www.amazon.com": [...],
    "http://github.com": [...]
  }
}

Options

You can use prepared options.

  • isProgress Boolean : If true, show CLI Progress while crawl.
  • isLog Boolean : If true, print request error log.
const siteMap = require('sitemap-crawler');
const link = 'http://www.npmjs.com';
 
siteMap(link, {isProgress : true, isLog : true}, (err, res) => {
  console.log('error:', err);
  console.log('siteMap:', res); // Print the siteMap from link
});

Authors

tinyjin - Github, Blog

License

This project has MIT License.

Versions

Current Tags

  • Version
    Downloads (Last 7 Days)
    • Tag
  • 1.0.0
    4
    • latest

Version History

  • Version
    Downloads (Last 7 Days)
    • Published
  • 1.0.0
    4

Package Sidebar

Install

npm i sitemap-crawler

Weekly Downloads

3

Version

1.0.0

License

MIT

Unpacked Size

7.01 kB

Total Files

5

Last publish

Collaborators

  • tinyjin