readme-crawler

    1.0.5 • Public • Published

    README Crawler (npm package)

    version downloads MIT License

    A Node.js webcrawler to download README files and recursively follow contained GitHub repository links. Read more here.

    Fetch the default README files display at a GitHub repository URL.

    Installation

    npm install --save readme-crawler

    Usage

    Create a new crawler instance and pass in a configuration object. Call the run method to download the README at the given URL.

      import ReadMeCrawler from 'readme-crawler';
    
      var crawler = new ReadMeCrawler({
        startUrl: 'https://github.com/jnv/lists',
        followReadMeLinks: true,
        outputFolderPath: './output/'
      });
    
      // -> fetch https://github.com/jnv/lists
      // -> download README in project root directory
      // -> export to new folder in root/output/repositories
      // -> generate list of other repository links
      // -> repeat steps on each link
      crawler.run();

    Configuration Properties

    Name Type Description
    startUrl string GitHub repository URL formated 'https://github.com/user/repo'
    followReadMeLinks boolean Recursively follow README links and export data at each repo
    outputFolderPath string Folder in for README downloads starting in project root

    Crawler Error

    Issue: each repo link will be written to a file named linkQueue.txt. There could be issues writing to this file asynchronously while the crawler is activated.

    Solution: restart the crawler with craweler.run() again. The link queue should contain links to use, but the crawler tried to read from the file before the file was finished writing.


    spencerlepine.com  ·  GitHub @spencerlepine  ·  Twitter @spencerlepine

    Install

    npm i readme-crawler

    DownloadsWeekly Downloads

    7

    Version

    1.0.5

    License

    MIT

    Unpacked Size

    11 kB

    Total Files

    9

    Last publish

    Collaborators

    • spencerlepine