epub-cfi-resolver

1.0.2 • Public • Published

A simple parser, resolver and generator for the EPUB-CFI format.

Usage

To install:

npm install epub-cfi-resolver

Then:

var CFI = require('epub-cfi-resolver');

var testCFI = "epubcfi(/6/4[chap01ref]!/4[body01]/10[para05]/3:5)";

// Parse the CFI
cfi = new CFI(testCFI);

// Resolve the CFI
var bookmark = await cfi.resolve('test.xml');

// bookmark now contains:
{
  node: <reference to Node>,
  offset: 5 // integer offset into text
}

You can then use e.g. scrollTo() like so:

var range = document.createRange(node);
range.setStart(bookmark.node, bookmark.offset);
range.setEnd(bookmark.node, bookmark.offset + 1);

var bookmarkY = range.getBoundingClientRect().top + window.scrollY;
scrollTo(0, bookmarkY);

API

new CFI(uri, opts)

Parse the CFI string contained in uri and create a CFI object. An error will be thrown if parsing failes.

If uri contains URI Escape Sequences then it should be unescaped using e.g. decodeURIComponent() before being handed to the constructor.

Opts:

  • flattenRange: If true and CFI is a range, pretend it isn't by parsing only the start of the range and ignoring the end. Default is false.

.get()

Return a copy of the parsed data.

Given this CFI:

epubcfi(/2/4[node-id]!/6/7:5[pre,post;s=b])

The parser will output:

[
  [ // first part of CFI
    {
      "nodeIndex": 2
    },
    {
      "nodeIndex": 4,
      "nodeID": "node-id"
    }
  ],
  [ // part after first '!'
    {
      "nodeIndex": 6
    },
    {
      "nodeIndex": 7,
      "offset": 5,
      "textLocationAssertion": {
        "pre": "pre",
        "post": "post"
      }
      "sideBias": "before"
    }
  ]
]

Or an example with temporal and spatial coordinates:

epubcfi(/2/4~3.14@4:2)

outputs:

[
  [
    {
      "nodeIndex": 2
    },
    {
      "nodeIndex": 4,
      "temporal": 3.14,
      "spatial": {
        "x": 4,
        "y": 2
      }
    }
  ]
]

If the CFI is a Range then it will output:

{
  from: ...
  to: ...
  isRange: true
}

Where from and to each are objects with nodeIndex etc. like the one shown above.

If the CFI is a Range and flattenRange: true is given to the constructor then the output will be the location of the beginning of the range as normal non-range output.

.resolveURI(index, doc)

Locate the node referenced by the specified part of the CFI where index refers to the part index from zero, e.g for the CFI /2!/4!/6 an index of 1 refers to the /4 part of the CFI.

doc is a Document or XMLDocument object. I the browser this could be window.document.

If the node pointed to by the specified part of the CFI refers to a URI in one of the ways allowed by the EPUB-CFI standard then that URI will be returned. If no URI is found then an error is thrown.

.resolveLast(doc, opts)

Assuming that doc is a Document or XMLDocument object for the URI referenced by the last part of the CFI (e.g. for /2!/4!/6 the last part is `/6), return an object referencing the node (or an

If the CFI is not a range, the resolver will output an object like:

{
  node: <reference to node>,
  offset: <positive integer>
  ...
}

including any additional properties from the parser relevant to position within the node, e.g. .sideBias. See section on .get().

Note that the returned offset may be adjusted from the one given in the CFI. This is because the CFI standard specifies that adjacents text and CDATA nodes should be treated as single nodes and the resolver translates from the CFI way of counting to the one used by the DOM standard.

In addition, the property .relativeToNode will be present if the CFI location was before or after a node (rather than at or inside a node). .relativeToNode can have the values "before" or "after".

If the CFI is a range then the output will be:

{
  from: {
    node: ...
    offset: ...
    ...
  },
  to: {
    node: ...
    offset: ...
    ...
  },
  isRange: true
}

unless the option range: true is given, in which case the output will be a proper Range object.

async .resolve(uriOrDoc, [fetchCB], [opts])

Resolve an entire CFI, fetching and parsing URIs as they are encountered. If successful returns an object as documented for the .resolveLast() API call.

uriOrDoc is the initial URI or Document/XMLDocument object where parsing should begin.

fetchCB is an optional async function that takes a URI as its sole argument, retrieves the HTML/XHTML/XML at that URI, parses it into a Document/XMLDocument object and returns it.

If fetchCB is not supplied, then a built in function will be used which relies on XMLHttpRequest. Note that this will not work if XMLHttpRequest is not available (e.g. in node.js) and instead an exception will be thrown.

CFI.generate(node, offset, extra)

Static function to generate a CFI string for a node reference and optional offset into a text node. The offset will be adjusted to conform to the CFI specification if needed. If present the extra string will be appended at the end of the CFI before the closing bracket.

Also see alternate calling convention below

CFI.generate(, extra)

Same as CFI.generate(node, offset, extra) except takes an array of objects containing {node: <nodeRef>, offset: <number>} pairs and outputs a CFI that includes ! indirection steps, assuming the array has more than one entry.

CFI.sort()

Sort-in-place an array of CFI objects based on when they appear in the document.

Based on CFI.compare(). Note that this is based on parsing the CFI strings only. No resolving is done by .sort().

CFI.compare(a, b)

Static function that compares two CFI objects, e.g:

const a = new CFI("epubcfi(/2/4!/6)");
const b = new CFI("epubcfi(/2/4!/7)");

const diff = CFI.compare(a, b);
  • If a comes first in the document then a value < 0 is returned
  • If b comes first in the document then a value > 0 is returned
  • If they are equal then 0 is returned

If only one CFI is a range only the beginning of the range is used for comparison. If both CFIs are ranges then first the beginnings are compared and if they are equal then the ends are compared.

CFI.compareParts(a, b)

Static function that takes two parsed path parts and compares them, e.g:

const a = new CFI("epubcfi(/2/4/4~5)").get()[0];
const b = new CFI("epubcfi(/2/2/4~2.3)").get()[0];

const diff = CFI.compareParts(a, b);

Return values same as for CFI.compare()

Example

To build the example:

npm run build

Then open example/example.html in a browser. It should be loaded through a web server, not from the filesystem.

Using with node.js

For use with node.js, instead of using DOMParser and .parseFromString() you can use the jsdom library like so:

var JSDOM = require('jsdom').JSDOM;

var jsdom = new JSDOM(str, {
  contentType: mimetype
});

var doc = jsdom.window.document;

Where mimetype is one of 'application/xhtml+xml' (for XHTML and XML data) or 'text/html' (for HTML data). Note that 'text/xml' and 'application/xml' are not supported, just use 'application/xhtml+xml' instead which works for XML data.

The doc variable will contain a Document equivalent to the output from the browser's DOMParser .parseFromString() function.

Unit tests

Run using: npm run test

About EPUB-CFI

A CFI, or Canonical Fragment Identifier, is like a more complicated #anchor as used in HTML href= attributes. Think of them as ebook bookmarks. You'd encounter them like so:

<a href="some/uri#epubcfi(/6/7:42)">a link</a>

CFIs allow specifying a precise location or range inside any XML/XHTML/HTML document by specifying how to traverse the document tree, e.g:

epubcfi(/6/7:42)

The above CFI specifies the 42nd character of the 7th child node of the 6th child node of the root node. It's important to note that CFI doesn't count nodes in the same way as the DOM standard: Elements always have even numbers, Text and CDATA nodes always have odd numbers, adjacent text/CDATA nodes are combined and have the same number, two adjacent elements are assumed to have a text node of zero length between them and the node index 0 means "before the first node".

The above CFI can be embellished with additional info e.g:

epubcfi(/6[foo]/7:42[don't panic])

This lets us know that the id= of the 6th child node is "foo" and the text at the 42nd character is "don't panic".

Additionally, CFIs can traverse multiple documents, e.g:

epubcfi(/2/4!/6[foo]/7:42[don't panic])

The ! marks the beginning of a new document so this CFI tells us to go to the 2nd child node of the 4th child node of the current document, then look for an attribute in that node that references another document (e.g. href=) and continue resolving the rest of the CFI in the referenced document.

CFIs can specify ranges using commas:

epubcfi(<common base path>,<from>,<to>)

which is useful for specifying highlighted text in an ebook.

ToDo

  • Add option for generator to generate Text Location Assertions
  • Implement options (e.g. spatial and temporal) for generator
  • Add unit tests for expected-to-fail data for resolver

Other similar projects

readium-cfi-js

Pros of using this project over readium-cfi-js:

  • ~27 kB vs. ~400 kB dist file (both numbers are for un-minified js)
  • Documented API and example code vs. no API nor usage docs
  • No dependencies vs. depends on jquery and lodash
  • Works with node.js vs. requires browser

Pros of using readium-cfi-js over this project:

  • Older more mature project vs. newer unproven codebase
  • Has more unit tests

Other differences. This project vs. readium-cfi-js:

  • AGPLv3 vs. BSD-3-Clause
  • Hand-written state machine vs. uses a parser generator (pegjs)
  • Not so strict parsing/resolving vs. strict parsing/resolving

epubcfi from epub.js

Just from a quick glance at the code it's clear that this implementation will fail in several likely real world scenarios since it uses simple .split() calls on characters that can have several meanings depending on where they appear, e.g. '!' and '/' can appear in element id= attributes as well as Text Location Assertions but this is not taken into account.

License and copyright

  • Copyright 2020 Marc Juul Christoffersen
  • License: AGPLv3

See the LICENSE file for full license.

Readme

Keywords

none

Package Sidebar

Install

npm i epub-cfi-resolver

Weekly Downloads

7

Version

1.0.2

License

AGPL-3.0

Unpacked Size

134 kB

Total Files

19

Last publish

Collaborators

  • juul