broken-links-inspector
TypeScript icon, indicating that this package has built-in type declarations

1.4.0 • Public • Published

Broken Links Inspector

NPM pipeline status coverage report

This project is heavily inspired by stevenvachon/broken-link-checker.

If you want to use this tool and need any help (instructions, bug fixes, features) open an issue!

Features:

  • inspects a web-page and all its URLs, reports broken ones
  • can go recursively, inspecting all pages within a domain
  • makes requests in parallel, shows indication of "work in progress"
  • does not check URL twice
  • reports OK, TIMEOUT, ERROR CODE or generic error
  • support configurable timeout
  • supports GET and HEAD methods (double checks with GET if HEAD fails)
  • supports a list of excluded URLs (glob matching) and/or excluded prefixes (e.g. mailto:)
  • can define OK codes, such as 999 for linkedin
  • supports different reporting, such as colored console or JUnit file
  • JUnit report is best used with CI (tested with GitLab)
  • need a feature, go to issues

How to install and run

npm i -g broken-links-inspector

bli inspect https://dbogatov.org -r -t 2000 -s linkedin --reporters console

# or
# bli inspect file://links.txt
# with a URL per line in a file links.txt
See output
................................................................................
................................................................................
........................
original request
	OK      : https://dbogatov.org/
	OK: 1, skipped: 0, broken: 0
https://dbogatov.org/
	OK      : https://scholar.google.com/citations?user=Mq8ButkAAAAJ
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/resume.pdf
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/cv.pdf
	OK      : https://twitter.com/Dima4ka007
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/vendor/css/merged.css
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/vendor/js/merged.js
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/dmytro-bogatov.jpg
	OK      : https://dbogatov.org/contact
	OK      : https://dbogatov.org/research
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/favicon.ico
	OK      : https://dbogatov.org/publications
	OK      : https://www.googletagmanager.com/gtag/js?id=UA-65293382-4
	OK      : https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css
	OK      : https://git.dbogatov.org/dbogatov/research-website/commit/39ecd1a9
	OK      : https://dbogatov.org/projects
	OK      : https://www.facebook.com/dkbogatov
	OK      : https://dbogatov.org/education
	OK      : https://github.com/dbogatov
	OK: 18, skipped: 3, broken: 0
https://dbogatov.org/education
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/config/grades.yml
	OK: 1, skipped: 21, broken: 0
https://dbogatov.org/projects
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/mandelbrot.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/matters-proj.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/shevastream.png
	OK      : https://github.com/WPIMHTC
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/status-site.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/bu-logo.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/projects/fabric.png
	OK      : https://github.com/dbogatov/shevastream
	OK      : https://legacy.dbogatov.org/Project/Mandelbrot
	OK      : https://github.com/dbogatov/legacy-website
	OK      : https://github.com/IBM/dac-lib
	OK      : https://github.com/dbogatov/status-site
	OK      : https://github.com/dbogatov/ore-benchmark
	OK      : https://shevastream.com/
	OK      : https://status.dbogatov.org/
	OK      : https://ore.dbogatov.org/
	OK      : http://matters.mhtc.org/
	OK      : https://dbogatov.org/assets/docs/dac-fabric.pdf
	OK: 18, skipped: 21, broken: 0
https://dbogatov.org/publications
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/mqp-paper.pdf
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/econ-paper.pdf
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/ore-presentation.pdf
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/ore-poster.pdf
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/ore-benchmark.pdf
	OK      : http://dispot.korkinlab.org/
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/dac-fabric.pdf
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/docs/dispot.pdf
	OK      : https://hub.docker.com/r/korkinlab/dispot
	OK      : https://github.com/korkinlab/dispot
	OK      : https://digitalcommons.wpi.edu/cgi/viewcontent.cgi?article=2915&context=iqp-all
	OK      : https://dl.acm.org/doi/10.14778/3324301.3324309
	OK      : https://doi.org/10.14778/3324301.3324309
	OK      : https://doi.org/10.1093/bioinformatics/btz587
	OK      : https://academic.oup.com/bioinformatics/article/35/24/5374/5539863
	OK: 15, skipped: 21, broken: 0
https://dbogatov.org/research
	OK      : http://people.cs.georgetown.edu/~kobbi/
	OK      : https://arxiv.org/abs/1706.01552
	OK      : https://www.cs.bu.edu/~reyzin/
	OK      : http://www.cs.bu.edu/~gkollios/
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/bjoern.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/kobi.jpg
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/kellaris.jpeg
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/lorenzo.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/leo.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/adam.jpg
	OK      : http://www.cs.bu.edu/fac/gkollios/
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/kollios.png
	OK      : https://d3g9eenuvjhozt.cloudfront.net/assets/img/collaborators/pixel.jpg
	OK      : https://www.icloud.com/sharedalbum/
	OK      : https://www.cics.umass.edu/people/oneill-adam
	OK      : https://computerscience.uchicago.edu/people/profile/lorenzo-orecchia/
	OK      : https://midas.bu.edu/
	OK      : https://dblp.org/pers/t/Tackmann:Bj=ouml=rn.html
	OK      : https://dbogatov.org/assets/docs/ore-benchmark.pdf
	OK      : https://dbogatov.org/assets/docs/dac-fabric.pdf
	OK: 20, skipped: 22, broken: 0
https://dbogatov.org/contact
	OK: 0, skipped: 23, broken: 0
OK: 73, skipped: 111, broken: 0

How to use

$ bli inspect -h

Usage: index inspect [options] <url> <file://>

Check links in the given URL or a text file

Options:
  -r, --recursive                             recursively check all links in all URLs within supplied host (ignored for file://) (default: false)
  -t, --timeout <number>                      timeout in ms after which the link will be considered broken (default: 2000)
  -g, --get                                   use GET request instead of HEAD (default: false)
  -s, --skip <globs>                          URLs to skip defined by globs, like '*linkedin*' (default: [])
  --reporters <coma-separated-strings>        Reporters to use in processing the results (junit, console) (default: ["console"])
  --retries <number>                          The number of times to retry TIMEOUT URLs (default: 3)
  --user-agent <string>                       The User-Agent header (default: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15
                                              (KHTML, like Gecko) Version/14.1 Safari/605.1.15")
  --ignore-prefixes <coma-separated-strings>  prefix(es) to ignore (without ':'), like mailto: and tel: (default: ["javascript","data","mailto","sms","tel","geo"])
  --accept-codes <coma-separated-numbers>     HTTP response code(s) (beyond 200-299) to accept, like 999 for linkedin (default: [999])
  --ignore-skipped                            Do not report skipped URLs (default: false)
  --single-threaded                           Do not enable parallelization (default: false)
  -v, --verbose                               log progress of checking URLs (default: false)
  -h, --help                                  display help for command

Return code is 1 if at least one broken link detected, 0 otherwise.

-r, --recursive will instruct inspector to keep checking all URLs in the original domain. Very useful for checking an entire website, such as personal blog. For example, bli inspect https://yoursite.com -r will check yoursite.com and if it finds something like yoursite.com/contact it will check that as well and will keep going. It will check all URLs on all pages, but will not parse "external" pages.

-t, --timeout <number> given in milliseconds sets a timeout for a request. If this timeout is exceeded, the check fails with TIMEOUT.

-g, --get instructs to use GET request instead fo the default HEAD request. If HEAD request fails, the URL will be retried with GET.

-s, --skip <coma-separated-globs> is a list of globs or parts of URL to skip. As an example, -s *linkedin* -s hello will instruct to skip all URLs which contain either linkedin or hello in them.

--reporters <coma-separated-strings> is a list of reporters to process the result. Currently there are two: console and junit. console will print appealing colored report to the console. junit will produce junit-report.xml file in the current directory. JUnit file treats pages as test suites and URLs in a page as test cases.

--retries will instruct the number of times to try a URL before declaring it failed.

--user-agent <string> will use specified User-Agent header (some websites reply with 401 Unauthorized for "bots")

--ignore-prefixes <coma-separated-strings> is a list of prefixes/ schemas to skip, such as mailto:. Provided list should not include colons.

--accept-codes <coma-separated-numbers> is a list of HTTP code to consider successful, like 999 for linkedin.

--ignore-skipped excludes skipped URLs from reports.

--single-threaded mandates a sequential execution (should be used in for debugging).

-v, --verbose currently unused.

How to build

npm install # to install dependencies

npm run build # to compile TS (result in ./dist/index.js)

npm run coverage # to run tests and coverage

Package Sidebar

Install

npm i broken-links-inspector

Weekly Downloads

0

Version

1.4.0

License

MIT

Unpacked Size

79.4 kB

Total Files

19

Last publish

Collaborators

  • dbogatov