This package has been deprecated

Author message:

Package no longer supported. Contact Support at https://www.npmjs.com/support for more info.

readme-crawler

1.0.5 • Public • Published

README Crawler (npm package)

version downloads MIT License

A Node.js webcrawler to download README files and recursively follow contained GitHub repository links. Read more here.

Fetch the default README files display at a GitHub repository URL.

Installation

npm install --save readme-crawler

Usage

Create a new crawler instance and pass in a configuration object. Call the run method to download the README at the given URL.

  import ReadMeCrawler from 'readme-crawler';

  var crawler = new ReadMeCrawler({
    startUrl: 'https://github.com/jnv/lists',
    followReadMeLinks: true,
    outputFolderPath: './output/'
  });

  // -> fetch https://github.com/jnv/lists
  // -> download README in project root directory
  // -> export to new folder in root/output/repositories
  // -> generate list of other repository links
  // -> repeat steps on each link
  crawler.run();

Configuration Properties

Name Type Description
startUrl string GitHub repository URL formated 'https://github.com/user/repo'
followReadMeLinks boolean Recursively follow README links and export data at each repo
outputFolderPath string Folder in for README downloads starting in project root

Crawler Error

Issue: each repo link will be written to a file named linkQueue.txt. There could be issues writing to this file asynchronously while the crawler is activated.

Solution: restart the crawler with craweler.run() again. The link queue should contain links to use, but the crawler tried to read from the file before the file was finished writing.


spencerlepine.com  ·  GitHub @spencerlepine  ·  Twitter @spencerlepine

Dependents (0)

Package Sidebar

Install

npm i readme-crawler

Weekly Downloads

2

Version

1.0.5

License

MIT

Unpacked Size

11 kB

Total Files

9

Last publish

Collaborators

  • spencerlepine