url-inspector

    3.3.3 • Public • Published

    url-inspector

    Get metadata about any URL.

    Limited memory and network usage.

    This is a node.js module.

    It returns and normalizes information found in http headers or in the resource itself using exiftool (which knows almost everything about files but html), or a sax parser to read oembed, opengraph, twitter cards, schema.org attributes or standard html tags.

    Both tools stop inspection when they gathered enough tags, or stop when a max number of bytes (depending on media type) have been downloaded.

    A demo using this module is available, with url-inspector-daemon

    • url url of the inspected resource

    • title title of the resource, or filename, or last component of pathname with query

    • description optional longer description, without title in it, and only the first line.

    • site the name of the site, or the domain name

    • mime RFC 7231 mime type of the resource (defaults to Content-Type) The inspected mime type could be more accurate than the http header.

    • ext the extension matching the mime type (not the file extension)

    • type what the resource represents image, video, audio, link, file, embed, archive

    • html a canonical html representation of the full resource, depending on the type and mime, could be an image, anchor, video, audio, or iframe.

    • script url of a script to install along with the html representation Breaking change: used to be in the html representation, but that required special handling of html to make it work.

    • date (YYYY-MD-DD format) creation or modification date

    • author optional credit, author (without the @ prefix and with _ replaced by spaces)

    • keywords optional array of collected keywords (lowercased words that are not in title words).

    • size (number) optional Content-Length; discarded when type is embed

    • icon optional link to the favicon of the site

    • width, height (number) optional dimensions

    • duration (hh:mm:ss string) optional

    • thumbnail optional a URL to a thumbnail, could be a data-uri for embedded images

    • source optional a URL that can go in a 'src' attribute; for example a resource can be an html page representing an image type. The URL of the image itself would be stored here; same thing for audio, video, embed types.

    • error optional an http error code, or string

    • all an object with all additional metadata that was found

    Installation

    npm install url-inspector

    Add -g switch to install the executable.

    exiftool executable must be available:

    • a package is available for debian/ubuntu: libimage-exiftool-perl and for fedora: perl-Image-ExifTool.
    • Otherwise it is installable from exiftool

    API

    const inspector = require('url-inspector');
    
    // options and their defaults
    const opts = {
     all: false, // return all available non-normalized metadata
     ua: "Mozilla/5.0", // some oembed providers might not answer otherwise
     nofavicon: false, // disable any favicon-related additional request
     nosource: false, // disable any sub-source inspection for audio, video, image types
     providers: [{ // an array of custom OEmbed providers, or path to a module exporting such an array
      provider_name: "Custom OEmbed provider",
      endpoints: [{
       schemes: ["http:\/\/video\.com\/*"],
       builder(urlObj, obj) {
        // can see current obj and override arbitrary props
        obj.embed = "custom embed url";
       },
       redirect(urlObj, ret) {
        // can change inspected url
        urlObj.path = "/another/path";
       }
      }]
     }],
     // new in version 2.3.0
     file: true
    };
    
    inspector(url, opts, function(err, obj) {
    
    });
    
    // or simply
    
    inspector(url, function(err, obj) {...});

    Command-line client

    inspector-url <url>
    inspector-url <filepath>

    Some options are available through cli, like --ua to test user agents.

    Low resource usage

    network:

    • a maximum of several hundreds of kilobytes (depending on resource type) is downloaded but it is usually much less, depending on connection speed.
    • inspection stops as soon as enough metadata is gathered

    memory: html is inspected using a sax parser, without building a full DOM.

    exiftool: runs using streat module, which keeps exiftool always open for performance

    Since version 2.3.0, file:// protocol is supported through cli by default, or setting "file" flag to true (false by default) through api.

    License

    See LICENSE.

    See also

    https://github.com/kapouer/url-inspector-daemon

    https://github.com/kapouer/node-streat

    Install

    npm i url-inspector

    DownloadsWeekly Downloads

    13

    Version

    3.3.3

    License

    MIT

    Unpacked Size

    43.2 kB

    Total Files

    12

    Last publish

    Collaborators

    • kapouer