Nearsighted Paramecium Multiverse

    broken-links-inspector
    TypeScript icon, indicating that this package has built-in type declarations

    1.4.0 • Public • Published

    Broken Links Inspector

    NPM pipeline status coverage report

    This project is heavily inspired by stevenvachon/broken-link-checker.

    If you want to use this tool and need any help (instructions, bug fixes, features) open an issue!

    Features:

    • inspects a web-page and all its URLs, reports broken ones
    • can go recursively, inspecting all pages within a domain
    • makes requests in parallel, shows indication of "work in progress"
    • does not check URL twice
    • reports OK, TIMEOUT, ERROR CODE or generic error
    • support configurable timeout
    • supports GET and HEAD methods (double checks with GET if HEAD fails)
    • supports a list of excluded URLs (glob matching) and/or excluded prefixes (e.g. mailto:)
    • can define OK codes, such as 999 for linkedin
    • supports different reporting, such as colored console or JUnit file
    • JUnit report is best used with CI (tested with GitLab)
    • need a feature, go to issues

    How to install and run

    npm i -g broken-links-inspector
    
    bli inspect https://dbogatov.org -r -t 2000 -s linkedin --reporters console
    
    # or
    # bli inspect file://links.txt
    # with a URL per line in a file links.txt
    
    See output
    ................................................................................
    ................................................................................
    ........................
    original request
    	OK      : https://dbogatov.org/
    	OK: 1, skipped: 0, broken: 0
    https://dbogatov.org/
    	OK      : https://scholar.google.com/citations?user=Mq8ButkAAAAJ
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/resume.pdf
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/cv.pdf
    	OK      : https://twitter.com/Dima4ka007
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/vendor/css/merged.css
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/vendor/js/merged.js
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/dmytro-bogatov.jpg
    	OK      : https://dbogatov.org/contact
    	OK      : https://dbogatov.org/research
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/favicon.ico
    	OK      : https://dbogatov.org/publications
    	OK      : https://www.googletagmanager.com/gtag/js?id=UA-65293382-4
    	OK      : https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css
    	OK      : https://git.dbogatov.org/dbogatov/research-website/commit/39ecd1a9
    	OK      : https://dbogatov.org/projects
    	OK      : https://www.facebook.com/dkbogatov
    	OK      : https://dbogatov.org/education
    	OK      : https://github.com/dbogatov
    	OK: 18, skipped: 3, broken: 0
    https://dbogatov.org/education
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/config/grades.yml
    	OK: 1, skipped: 21, broken: 0
    https://dbogatov.org/projects
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/mandelbrot.png
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/matters-proj.png
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/shevastream.png
    	OK      : https://github.com/WPIMHTC
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/status-site.png
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/bu-logo.png
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/fabric.png
    	OK      : https://github.com/dbogatov/shevastream
    	OK      : https://legacy.dbogatov.org/Project/Mandelbrot
    	OK      : https://github.com/dbogatov/legacy-website
    	OK      : https://github.com/IBM/dac-lib
    	OK      : https://github.com/dbogatov/status-site
    	OK      : https://github.com/dbogatov/ore-benchmark
    	OK      : https://shevastream.com/
    	OK      : https://status.dbogatov.org/
    	OK      : https://ore.dbogatov.org/
    	OK      : http://matters.mhtc.org/
    	OK      : https://dbogatov.org/assets/docs/dac-fabric.pdf
    	OK: 18, skipped: 21, broken: 0
    https://dbogatov.org/publications
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/mqp-paper.pdf
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/econ-paper.pdf
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/ore-presentation.pdf
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/ore-poster.pdf
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/ore-benchmark.pdf
    	OK      : http://dispot.korkinlab.org/
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/dac-fabric.pdf
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/dispot.pdf
    	OK      : https://hub.docker.com/r/korkinlab/dispot
    	OK      : https://github.com/korkinlab/dispot
    	OK      : https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=2915&context=iqp-all
    	OK      : https://dl.acm.org/doi/10.14778/3324301.3324309
    	OK      : https://doi.org/10.14778/3324301.3324309
    	OK      : https://doi.org/10.1093/bioinformatics/btz587
    	OK      : https://academic.oup.com/bioinformatics/article/35/24/5374/5539863
    	OK: 15, skipped: 21, broken: 0
    https://dbogatov.org/research
    	OK      : http://people.cs.georgetown.edu/~kobbi/
    	OK      : https://arxiv.org/abs/1706.01552
    	OK      : https://www.cs.bu.edu/~reyzin/
    	OK      : http://www.cs.bu.edu/~gkollios/
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/bjoern.png
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/kobi.jpg
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/kellaris.jpeg
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/lorenzo.png
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/leo.png
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/adam.jpg
    	OK      : http://www.cs.bu.edu/fac/gkollios/
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/kollios.png
    	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/pixel.jpg
    	OK      : https://www.icloud.com/sharedalbum/
    	OK      : https://www.cics.umass.edu/people/oneill-adam
    	OK      : https://computerscience.uchicago.edu/people/profile/lorenzo-orecchia/
    	OK      : https://midas.bu.edu/
    	OK      : https://dblp.org/pers/t/Tackmann:Bj=ouml=rn.html
    	OK      : https://dbogatov.org/assets/docs/ore-benchmark.pdf
    	OK      : https://dbogatov.org/assets/docs/dac-fabric.pdf
    	OK: 20, skipped: 22, broken: 0
    https://dbogatov.org/contact
    	OK: 0, skipped: 23, broken: 0
    OK: 73, skipped: 111, broken: 0
    

    How to use

    $ bli inspect -h
    
    Usage: index inspect [options] <url> <file://>
    
    Check links in the given URL or a text file
    
    Options:
      -r, --recursive                             recursively check all links in all URLs within supplied host (ignored for file://) (default: false)
      -t, --timeout <number>                      timeout in ms after which the link will be considered broken (default: 2000)
      -g, --get                                   use GET request instead of HEAD (default: false)
      -s, --skip <globs>                          URLs to skip defined by globs, like '*linkedin*' (default: [])
      --reporters <coma-separated-strings>        Reporters to use in processing the results (junit, console) (default: ["console"])
      --retries <number>                          The number of times to retry TIMEOUT URLs (default: 3)
      --user-agent <string>                       The User-Agent header (default: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15
                                                  (KHTML, like Gecko) Version/14.1 Safari/605.1.15")
      --ignore-prefixes <coma-separated-strings>  prefix(es) to ignore (without ':'), like mailto: and tel: (default: ["javascript","data","mailto","sms","tel","geo"])
      --accept-codes <coma-separated-numbers>     HTTP response code(s) (beyond 200-299) to accept, like 999 for linkedin (default: [999])
      --ignore-skipped                            Do not report skipped URLs (default: false)
      --single-threaded                           Do not enable parallelization (default: false)
      -v, --verbose                               log progress of checking URLs (default: false)
      -h, --help                                  display help for command
    

    Return code is 1 if at least one broken link detected, 0 otherwise.

    -r, --recursive will instruct inspector to keep checking all URLs in the original domain. Very useful for checking an entire website, such as personal blog. For example, bli inspect https://yoursite.com -r will check yoursite.com and if it finds something like yoursite.com/contact it will check that as well and will keep going. It will check all URLs on all pages, but will not parse "external" pages.

    -t, --timeout <number> given in milliseconds sets a timeout for a request. If this timeout is exceeded, the check fails with TIMEOUT.

    -g, --get instructs to use GET request instead fo the default HEAD request. If HEAD request fails, the URL will be retried with GET.

    -s, --skip <coma-separated-globs> is a list of globs or parts of URL to skip. As an example, -s *linkedin* -s hello will instruct to skip all URLs which contain either linkedin or hello in them.

    --reporters <coma-separated-strings> is a list of reporters to process the result. Currently there are two: console and junit. console will print appealing colored report to the console. junit will produce junit-report.xml file in the current directory. JUnit file treats pages as test suites and URLs in a page as test cases.

    --retries will instruct the number of times to try a URL before declaring it failed.

    --user-agent <string> will use specified User-Agent header (some websites reply with 401 Unauthorized for "bots")

    --ignore-prefixes <coma-separated-strings> is a list of prefixes/ schemas to skip, such as mailto:. Provided list should not include colons.

    --accept-codes <coma-separated-numbers> is a list of HTTP code to consider successful, like 999 for linkedin.

    --ignore-skipped excludes skipped URLs from reports.

    --single-threaded mandates a sequential execution (should be used in for debugging).

    -v, --verbose currently unused.

    How to build

    npm install # to install dependencies
    
    npm run build # to compile TS (result in ./dist/index.js)
    
    npm run coverage # to run tests and coverage
    

    Install

    npm i broken-links-inspector

    DownloadsWeekly Downloads

    22

    Version

    1.4.0

    License

    MIT

    Unpacked Size

    79.4 kB

    Total Files

    19

    Last publish

    Collaborators

    • dbogatov