node package manager
Loosely couple your services. Use Orgs to version and reuse your code. Create a free org »

streampub

Streampub

A streaming EPUB3 writer.

EXAMPLE

var Streampub = require('streampub')
var fs = require('fs')
var epub = new Streampub({title: 'My Example'})
epub.setAuthor('Example User')
epub.pipe(fs.createWriteStream('example.epub'))
epub.write(Streampub.newChapter('Chapter 1', '<b>doc content</b>', 0, 'chapter-1.xhtml'))
epub.end()

USAGE

var epub = new Streampub(opts)

opts is an object that optionally has the following properties:

  • id String - Default: url:source or a UUID A unique identifier for this work. Note that URLs for this field must be prefixed by "url:".
  • title String - Default: "Untitled" The title of the epub.
  • author String - Optional The name of the author of the epub.
  • authorUrl String - Optional Only used if an author name is used as well. Adds a related foaf:homepage link to the author record. Also a Calibre link_map, but as yet, Calibre seems unwilling to import this.
  • modified Date - Default: new Date() When the epub was last modified.
  • published Date - Optional When the source material was published.
  • source String - Optional The original URL or URN of the source material. "The described resource may be derived from the related resource in whole or in part. Recommended best practice is to identify the related resource by means of a string conforming to a formal identification system."
  • language String - Default: "en" Identifies the language used in the book content. The content has to comply with RFC 3066. List of language codes.
  • description String - Optional A brief description or summary of the material.
  • publisher String - Optional "An entity responsible for making the resource available."
  • subject String - Optional Calibre treats this field as a comma separated list of tag names. "Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary."
  • includeTOC Boolean - If true, generate a separate Table of Contents page distinct from the one the ereader uses for navigation.
  • numberTOC Boolean - If true, suppress the ol based list numbering and put our own as text. Necessary to have numbers in front of each TOC entry with most readers.
  • calibre Object - Optional If set, an object containing Calibre user fields which will be filled in on import to Calibre.

A note on the calibre object: The format of this object requires a little discussion. In order for Calibre to import into a custom filed you have to provide both a matching name and a matching type. This is best explained via example. For our example, let's assume you have a Calibre field named #words that contains the number of words in a work. To make that available to Calibre you'd pass in an object like:

{words: {'#value#': wordCount, datatype: 'int'}}

Other useful datatypes are text and enumeration.

All of the options can be set after object creation with obvious setters:

  • epub.setId(id)
  • epub.setTitle(title)
  • epub.setAuthor(author)
  • epub.setAuthorUrl(author)
  • epub.setModified(modified)
  • epub.setPublished(published)
  • epub.setSource(source)
  • epub.setLanguage(language)
  • epub.setDescription(description)
  • epub.setPublisher(publisher)
  • epub.setSubject(subject)
  • epub.setIncudeTOC(includeTOC)
  • epub.setNumberTOC(numberTOC)
  • epub.setCalibre(calibre)

The Streampub Object

The Streampub object is a transform stream that takes chapter information as input and outputs binary chunks of an epub file. It's an ordinary stream so you can pipe into it or write to it and call .end() when you're done.

var epub.write(obj, callback)

This is the usual stream write function. The object can either be constructed with:

Streampub.newChapter(chapterName, content, index, fileName, mime)
Streampub.newCoverImage(content, mime)
Streampub.newFile(fileName, content, mime)

Or by hand by creating an object with the following keys:

  • id String - Optional Internal ID of object, if omited streampub will generate one.
  • chapterName String - Required The name of the chapter in the index.
  • content String or stream.Readable - Required The content of item being added. If this is HTML then it will be run through parse5 and xmlserializer to make it valid XHTML.
  • index Number - Optional Where the chapter should show up in the index. These numbers can have gaps and are used for ordering ONLY. Duplicate index values will result in the earlier chapter being excluded from the index. If not specified will be added after any where it was specified, in the order written.
  • fileName String - Optional The filename to use inside the epub. For chapters this is only needed if you want to inter-chapter linking. Uses are more obvious for CSS and images. If content is an fs stream then this will default to a value inferred from the original filename.
  • mime String - Optional Mimetype of content, if not supplied streampub will try to determine type.

If you include indexes then you can add chapters in any order.

Example

var Streampub = require('./index')
var fs = require('fs')
 
var epub = new Streampub({title: 'My Example'})
epub.setAuthor('Example author')
epub.pipe(fs.createWriteStream('example.epub'))
epub.write(Streampub.newFile(fs.createReadStream('author.jpg')))
epub.write(Streampub.newFile('stylesheet.css', fs.createReadStream('styles.css')))
epub.write(Streampub.newChapter('Chapter 1', '<h1>Chapter 1</h1><b>doc content</b>'))
epub.write(Streampub.newChapter('Chapter 2', '<h1>Chapter 2</h1><b>doc content</b>'))
epub.end()

or equivalently

var epub = new Streampub({title: 'My Example'})
epub.setAuthor('Example author')
epub.pipe(fs.createWriteStream('example.epub'))
epub.write({content: fs.createReadStream('author.jpg')})
epub.write({fileName: 'stylesheet.css', content: fs.createReadStream('styles.css')})
epub.write({chapterName: 'Chapter 1', content: '<h1>Chapter 1</h1><b>doc content</b>'})
epub.write({chapterName: 'Chapter 2', content: '<h1>Chapter 2</h1><b>doc content</b>'})
epub.end()

Cover image

The epub specification does not contain a standarized way to include book covers. There is however a "best practice" that will work in most reader applications. streampub has some magic under the hood to correctly add a cover image. The only requirements are that the file needs to be in JPEG format and should be max 1000x1000 pixels.

Example

var Streampub = require('./index')
var fs = require('fs')
 
var epub = new Streampub({title: 'My Example'})
epub.setAuthor('Example author')
epub.pipe(fs.createWriteStream('example.epub'))
// Using this specific ID causes cover magic to kick in
epub.write(Streampub.newCoverImage(fs.createReadStream('cover.jpg')))
epub.write(Streampub.newChapter('Chapter 1', '<h1>Chapter 1</h1><b>doc content</b>'))
epub.write(Streampub.newChapter('Chapter 2', '<h1>Chapter 2</h1><b>doc content</b>'))
epub.end()

or equivalently

var Streampub = require('./index')
var fs = require('fs')
 
var epub = new Streampub({title: 'My Example'})
epub.setAuthor('Example author')
epub.pipe(fs.createWriteStream('example.epub'))
// Using this specific ID causes cover magic to kick in
epub.write({id: 'cover-image', content: fs.createReadStream('cover.jpg')})
epub.write({chapterName: 'Chapter 1', content: '<h1>Chapter 1</h1><b>doc content</b>'})
epub.write({chapterName: 'Chapter 2', content: '<h1>Chapter 2</h1><b>doc content</b>'})
epub.end()

VALIDATION

This takes care to generate only valid XML using programmatic generators and not templates

Epubs produced by this have been validated with epubcheck. No warnings outside of content warnings should be present.

Content warnings ordinarily only happen if your content contains broken links–usually relative links to resources that don't exist in the epub.

PRIOR ART

There are a bunch of epub generators already available. Many are pre EPUB3. Most work off of files on disk rather than in memory constructs. Only one other provides a stream that I was able to find was epub-generator and it only provides a read stream. I wanted to be able to build a full pipeline for, for example, backpressure reasons. I also very much wanted to be able to set epub metadata after object construction time.