Wondering what’s next for npm?Check out our public roadmap! »

    html-urls

    2.4.22 • Public • Published

    html-urls

    Last version Build Status Coverage Status Dependency status Dev Dependencies Status NPM Status

    Get all URLs from a HTML markup. It's based on W3C link checker.

    Install

    $ npm install html-urls --save

    Usage

    const got = require('got')
    const htmlUrls = require('html-urls')
    
    ;(async () => {
      const url = process.argv[2]
      if (!url) throw new TypeError('Need to provide an url as first argument.')
      const { body: html } = await got(url)
      const links = htmlUrls({ html, url })
    
      links.forEach(({ url }) => console.log(url))
    
      // => [
      //   'https://microlink.io/component---src-layouts-index-js-86b5f94dfa48cb04ae41.js',
      //   'https://microlink.io/component---src-pages-index-js-a302027ab59365471b7d.js',
      //   'https://microlink.io/path---index-709b6cf5b986a710cc3a.js',
      //   'https://microlink.io/app-8b4269e1fadd08e6ea1e.js',
      //   'https://microlink.io/commons-8b286eac293678e1c98c.js',
      //   'https://microlink.io',
      //   ...
      // ]
    })()

    It returns the following structure per every value detect on the HTML markup:

    value

    Type: <string>

    The original value.

    url

    Type: <string|undefined>

    The normalized URL, if the value can be considered an URL.

    uri

    Type: <string|undefined>

    The normalized value as URI.


    See examples for more!

    API

    htmlUrls([options])

    options

    html

    Type: string
    Default: ''

    The HTML markup.

    url

    Type: string
    Default: ''

    The URL associated with the HTML markup.

    It is used for resolve relative links that can be present in the HTML markup.

    whitelist

    Type: array
    Default: []

    A list of links to be excluded from the final output. It supports regex patterns.

    See matcher for know more.

    removeDuplicates

    Type: boolean
    Default: true

    Remove duplicated links detected over all the HTML tags.

    Related

    • xml-urls – Get all urls from a Feed/Atom/RSS/Sitemap xml markup.
    • css-urls – Get all URLs referenced from stylesheet files.

    License

    html-urls © Kiko Beats, released under the MIT License.
    Authored and maintained by Kiko Beats with help from contributors.

    kikobeats.com · GitHub @Kiko Beats · Twitter @Kikobeats

    Install

    npm i html-urls

    DownloadsWeekly Downloads

    245

    Version

    2.4.22

    License

    MIT

    Unpacked Size

    25.9 kB

    Total Files

    5

    Last publish

    Collaborators

    • avatar