reurl

1.0.0-rc.2 • Public • Published

NPM badge

ReURL

ReUrl is a library for parsing and manipulating URLs. It supports relative- and non-normalized URLs and a number of operations on them. It can be used to parse, resolve, normalize and serialize URLs in separate phases and in such a way that it conforms to the WhatWG URL Standard.

Motivation

I wrote this library because I needed a library that supported non-normalized and relative URLs but I also wanted to be certain that it followed the specification completely.

The WhatWG URL Standard defines URLs in terms of a parser algorithm that resolves URLs, normalizes URLs and serializes URL components in one pass. Thus to implement a library that follows the standard, but also supports a versatile set of operations on relative, and non-normalized URLs, I had to disentangle these phases from the specification and to some extent rephrase the specification in more elementary terms.

Eventually I came up with a small 'theory' of URLs that I found very helpful and I based the library on that. Over time, this theory has become thoroughly documented in this new URL Specification.

API

Overview

The ReUrl library exposes an Url class and a RawUrl class with an identical API. Their only difference is in their handling of percent escape sequences.

Url

For Url objects the URL parser decodes percent escape sequences, getters report percent-decoded values and the set method assumes that its input is percent-decoded unless explicitly specified otherwise.

var url = new Url ('//host/%61bc')
url.file // => 'abc'
url = url.set ({ query:'%def' })
url.query // => '%def'
url.toString () // => '//host/abc?%25def'
RawUrl

For RawUrl objects the parser preserves percent escape sequences, getters report values with percent-escape-sequenes preserved and set expects values in which % signs start a percent-escape sequence.

var url = new RawUrl ('//host/%61bc')
url.file // => '%61bc'
url = url.set ({ query:'%25%64ef' })
url.query // => '%25%64ef'
url.toString () // => '//host/%61bc?%25%64ef'

Url and RawUrl objects are immutable. Modifying URLs is acomplished through methods that return new Url and/ or RawUrl objects, such as the url.set (patch) method described below.

Constructors

new Url (string \[, conf])

Construct a new Url object from an URL-string. The optional conf argument, if present must be a configuration object as described below.

var url = new Url ('sc:/foo/bar')
console.log (url)
// => Url { scheme: 'sc', root: '/', dirs: [ 'foo' ], file: 'bar' }
new Url (object)

Construct a new Url object from any object, possibly an Url object itself. The optional conf argument, if present, must be a configuration object as described below. Throws an error if the object cannot be coerced into a valid URL.

var url = new Url ({ scheme:'file', dirs:['foo', 'buzz'], file:'abc' })
console.log (url.toString ())
// => 'file:foo/buzz/abc'
conf.parser

You can pass a configuration object with a parser property to the Url constructor to trigger scheme-specific parsing behaviour for relative, scheme-less URL-strings.

The scheme determines support for windows drive-letters and backslash separators. Drive-letters are only supported in file URL-strings, and backslash separators are limited to file, http, https, ws, wss and ftp URL-strings.

var url = new Url ('/c:/foo\\bar', { parser:'file' })
console.log (url)
// => Url { drive: 'c:', root: '/', dirs: [ 'foo' ], file: 'bar' }
var url = new Url ('/c:/foo\\bar', { parser:'http' })
console.log (url)
// => Url { root: '/', dirs: [ 'c:', 'foo' ], file: 'bar' }
var url = new Url ('/c:/foo\\bar')
console.log (url)
// => Url { root: '/', dirs: [ 'c:', 'foo' ], file: 'bar' }

Properties

Url and RawUrl objects have the following optional properties.

url.scheme

The scheme of an URL as a string. This property is absent if no scheme part is present, e.g. in scheme-relative URLs.

new Url ('http://foo?search#baz') .scheme
// => 'http'
new Url ('/abc/?') .scheme
// => undefined
url.user

The username of an URL as a string. This property is absent if the URL does not have an authority or does not have credentials.

new Url ('http://joe@localhost') .user
// => 'joe'
new Url ('//host/abc') .user
// => undefined
url.pass

A property for the password of an URL as a string. This property is absent if the URL does not have an authority, credentials or password.

new Url ('http://joe@localhost') .pass
// => undefined
new Url ('http://host') .pass
// => undefined
new Url ('http://joe:pass@localhost') .pass
// => 'pass'
new Url ('http://joe:@localhost') .pass
// => ''
url.host

A property for the hostname of an URL as a string, This property is absent if the URL does not have an authority.

new Url ('http://localhost') .host
// => 'localhost'
new Url ('http:foo') .host
// => undefined
new Url ('/foo') .host
// => undefined
url.port

The port of (the authority part of) of an URL, being either a number, or the empty string if present. The property is absent if the URL does not have an authority or a port.

new Url ('http://localhost:8080') .port
// => 8080
new Url ('foo://host:/foo') .port
// => ''
new Url ('foo://host/foo') .port
// => undefined
url.root

A property for the path-root of an URL. Its value is '/' if the URL has an absolute path. The property is absent otherwise.

new Url ('foo://localhost?q') .root
// => undefined
new Url ('foo://localhost/') .root
// => '/'
new Url ('foo/bar')
// => Url { dirs: [ 'foo' ], file: 'bar' }
new Url ('/foo/bar')
// => Url { root: '/', dirs: [ 'foo' ], file: 'bar' }

It is possible for file URLs to have a drive, but not a root.

new Url ('file:/c:')
// => Url { scheme: 'file', drive: 'c:' }
new Url ('file:/c:/')
// => Url { scheme: 'file', drive: 'c:', root: '/' }
url.drive

A property for the drive of an URL as a string, if present. Note that the presence of drives depends on the parser settings and/ or URL scheme.

new Url ('file://c:') .drive
// => 'c:'
new Url ('http://c:') .drive
// => undefined
new Url ('/c:/foo/bar', 'file') .drive
// => 'c:'
new Url ('/c:/foo/bar') .drive
// => undefined
url.dirs

If present, a nonempty array of strings. Note that the trailing slash determines whether a component is part of the dirs or set as the file property.

new Url ('/foo/bar/baz/').dirs
// => [ 'foo', 'bar', 'baz' ]
new Url ('/foo/bar/baz').dirs
// => [ 'foo', 'bar' ]
url.file

If present, a non-empty string.

new Url ('/foo/bar/baz') .file
// => 'baz'
new Url ('/foo/bar/baz/') .file
// => undefined
url.query

A property for the query part of url as a string, if present.

new Url ('http://foo?search#baz') .query
// => 'search'
new Url ('/abc/?') .query
// => ''
new Url ('/abc/') .query
// => undefined
url.hash

A property for the hash part of url as a string, if present.

new Url ('http://foo#baz') .hash
// => 'baz'
new Url ('/abc/#') .hash
// => ''
new Url ('/abc/') .hash
// => undefined

Setting Properties

Url and RawUrl objects are immutable, therefore setting and removing components is achieved via a set method that takes a patch object.

url.set (patch)

The patch object may contain one or more keys being scheme, user, pass, host, port, drive, root, dirs, file, query and/ or hash. To remove a component you can set its patch' value to null.

If present; – port must be null, a string, or a number – dirs must be an array of strings – root may be anything and is converted to '/' if truth-y and is interpreted as null otherwise – all others must be null or a string.

new Url ('//host/dir/file')
  .set ({ host:null, query:'q', hash:'h' })
  .toString ()
// => '/dir/file?q#h'
Resets

For security reasons, setting the user will remove pass, unless a value is supplied for it as well. Setting the host will remove user, pass and port, unless values are supplied for them as well.

new Url ('http://joe:secret@example.com')
  .set ({ user:'jane' })
  .toString ()
// => 'http://jane@example.com'
new Url ('http://joe:secret@localhost:8080')
  .set ({ host:'example.com' })
  .toString ()
// => 'http://example.com'
patch.percentCoded

The patch may have an additional key percentCoded with a boolean value to indicate that strings in the patch contain percent encode sequences.

This means that you can pass percent-encoded values to Url.set by explicity setting percentCoded to true. The values will then be decoded.

var url = new Url ('//host/')
url = url.set ({ file:'%61bc-%25-sign', percentCoded:true })
url.file // => 'abc-%-sign'
log (url.toString ()) // => '//host/abc-%25-sign'

You can pass percent-decoded values to RawUrl.set by explicitly setting percentCoded to false. Percent characters in values will then be encoded; specifically, they will be replaced with %25.

var rawUrl = new RawUrl ('//host/')
rawUrl = rawUrl.set ({ file:'abc-%-sign', percentCoded:false })
rawUrl.file // => 'abc-%25-sign'
rawUrl.toString () // => '//host/abc-%25-sign'

Note that if no percentCoded value is specified, then Url.set assumes percentCoded to be false whilst RawUrl.set assumes percentCoded to be true.

var url = new Url ('//host/') .set ({ file:'%61bc' })
url.file // => '%61bc'
url.toString () // => '//host/%2561bc'
var rawUrl = new RawUrl ('//host/') .set ({ file:'%61bc' })
url.file // => '%61bc'
rawUrl.toString () // => '//host/%61bc'

Conversions

url.toString ()

Converts an Url object to a string. Percent encodes only a minimal set of codepoints. The resulting string may contain non-ASCII codepoints.

var url = new Url ('http://🌿🌿🌿/{braces}/hʌɪ')
url.toString ()
// => 'http://🌿🌿🌿/%7Bbraces%7D/hʌɪ'
url.toASCII (), url.toJSON (), url.href

Converts an Url object to a string that contains only ASCII code points. Non-ASCII codepoints in components will be percent encoded and/ or punycoded.

var url = new Url ('http://🌿🌿🌿/{braces}/hʌɪ')
url.toASCII ()
// => 'http://xn--8h8haa/%7Bbraces%7D/h%CA%8C%C9%AA'
url.toURI ()

Uses url.toASCII () to convert url to an RFC3986 URI. Throws an error if url does not have a scheme, because URIs must always have a scheme.

Normalisation

url.normalize (), url.normalise ()

Returns a new Url object by normalizing url. This interprets a.o. . and .. segments within the path and removes default ports and trivial usernames/ passwords from the authority of url.

new Url ('http://foo/bar/baz/./../bee') .normalize () .toString ()
// => 'http://foo/bar/bee'

Percent Coding

url.percentEncode ()

Returns a RawUrl object by percent-encoding the properties of url according to the Standard. Prevents double escaping of percent-encoded-bytes in the case of RawUrl objects.

url.percentDecode ()

Returns an Url object by percent-decoding the properties of url if it is a RawUrl, and leaving them as-is otherwise.

Goto

url.goto (url2)

Returns a new Url object by 'extending' url with url2, where url2 may be a string, an Url or a RawUrl object.

new Url ('/foo/bar') .goto ('baz/index.html') .toString ()
// => '/foo/baz/index.html'
new Url ('/foo/bar') .goto ('//host/path') .toString ()
// => '//host/path'
new Url ('http://foo/bar/baz/') .goto ('./../bee') .toString ()
// => 'http://foo/bar/baz/./../bee'

If url2 is a string, it will be parsed with the scheme of url as a fallback scheme. TODO: if url has no scheme then …

new Url ('file://host/dir/') .goto ('c|/dir2/') .toString ()
// => 'file://host/c|/dir2/'
new Url ('http://host/dir/') .goto ('c|/dir2/') .toString ()
// => 'http://host/dir/c|/dir2/'

Base URLs

url.isBase ()

Returns a boolean, indicating if url is a base-URL. What is and is not a base-URL, depends on the scheme of an URL. For example, http- and file-URLs that do not have a host are not base-URLs.

url.force ()

Forcibly convert an Url to a base-URL according to this URL Specification, in accordance with the WHATWG Standard.

  • In file URLs without hostname, the hostname will be set to ''.
  • For URLs that have a scheme being one of http, https, ws, wss or ftp and an absent or empty authority, the authority will be 'stolen from the first nonempty path segment'.
  • In the latter case, an error is thrown if url cannot be forced. This happens if it has no scheme, or if it has an empty host and no non-empty path segment.
new Url ('http:foo/bar') .force () .toString ()
// => 'http://foo/bar'
new Url ('http:/foo/bar') .force () .toString ()
// => 'http://foo/bar'
new Url ('http://foo/bar') .force () .toString ()
// => 'http://foo/bar'
new Url ('http:///foo/bar') .force () .toString ()
// => 'http://foo/bar'

Reference Resolution

url.genericResolve (base) — RFC3986 - strict

Resolve an Url object url against a base URL base according to the strict reference resolution algorithm as defined in RFC3986.

url.legacyResolve (base) — RFC 3986 - non-strict

Resolve an Url object url against a base URL base according to the non-strict reference resolution algorithm as defined in RFC3986.

url.WHATWGResolve (base), aka. url.resolve

Resolve an Url object url against a base URL base in a way that is compatible with the error-correcting, forcing reference resoluton algorithm as defined in the WHATWG Standard.

Changelog

Version 1.0.0-rc.2

  • Converted the project from a CommonJS Module to an ES Module.
  • Updated the core to use spec-url version 2.0.0-dev.1
  • Changes to the API for reference resolution.

ReUrl now exposes three methods for reference resolution:

  • url.genericResolve (base)
  • url.legacyResolve (base)
  • url.WHATWGResolve (base), also known as
  • url.resolve (base)

License

MIT.

Enjoy!

Package Sidebar

Install

npm i reurl

Weekly Downloads

210

Version

1.0.0-rc.2

License

MIT

Unpacked Size

47.7 kB

Total Files

11

Last publish

Collaborators

  • alwinb