phantom-sitemap

0.1.2 • Public • Published

phantom-sitemap

Crawls a site, extracts the links and returns the promise of either a sitemap or just a list of links.

If a url has a hashbang (#!) or the page contains the fragment meta tag, the html to parse will be created by calling on phantomjs.

var defaultOptions = { maxDepth: 1,
					   maxFollow: 0,
					   verbose: false,
					   silent: false,
					   //timeout for a request:
					   timeout: 60000,
					   //interval before trying again:
					   retryTimeout: 10000,
					   retries:3,
					   ignore: ['xls', 'png', 'jpg', 'png','js', 'css' ], 
					   include: ['pdf', 'doc'], //include other crawlable assets to list
					   cacheDir: './cache',
					   sitemap: true,
					   out: 'sitemap.xml',
					   replaceHost: 'www.example.com'
					 };

Set options.sitemap to false to return just a list of links.

// Test
var crawl = module.exports(options);
crawl('http://localhost:9000').when(
	function(data) {
		console.log('RESULT:\n', data);
	}
	,function(err) {
		console.log('ERROR', err);
	}
)

Using node-crawler to crawl static pages.

TODO: create html map

Versions

Current Tags

  • Version
    Downloads (Last 7 Days)
    • Tag
  • 0.1.2
    0
    • latest

Version History

  • Version
    Downloads (Last 7 Days)
    • Published
  • 0.1.2
    0
  • 0.1.1
    0

Package Sidebar

Install

npm i phantom-sitemap

Weekly Downloads

0

Version

0.1.2

License

none

Last publish

Collaborators

  • michieljoris