Have ideas to improve npm?Join in the discussion! »

@stencila/encodaTypeScript icon, indicating that this package has built-in type declarations

0.104.5 • Public • Published


Codecs for structured, semantic, composable, and executable documents

Build status Code coverage NPM Contributors Docs Chat


"A codec is a device or computer program for encoding or decoding a digital data stream or signal. Codec is a portmanteau of coder-decoder. - Wikipedia

Encoda provides a collection of codecs for converting between, and composing together, documents in various formats. The aim is not to achieve perfect lossless conversion between alternative document formats; there are already several tools for that. Instead the focus of Encoda is to use existing tools to encode and compose semantic documents in alternative formats.


Format Codec Approach Status Issues Coverage
Plain text txt None β
Markdown md Extens α
LaTex latex - α
Microsoft Word docx rPNG α
Google Docs gdoc rPNG α
Open Document Text odt rPNG α
HTML html Extens α
JATS XML jats Extens α
JATS XML (Pandoc-based) jats-pandoc Extens α
Portable Document Format pdf rPNG α
Jupyter ipynb Native α
RMarkdown xmd Native α
Microsoft Powerpoint pptx rPNG
Demo Magic dmagic Native β
Microsoft Excel xlsx Formula β
Google Sheets gsheet Formula
Open Document Spreadsheet ods Formula β
Tabular data
CSV csv None β
CSVY csvy None
Tabular Data Package tdp None β
Document Archive dar Extens ω
Filesystem Directory dir Extens ω
Data interchange, other
JSON json Native
JSON-LD [json-ld] Native [][jsonld-issues]
JSON5 json5 Native
YAML yaml Native
Pandoc pandoc Native β
Reproducible PNG rpng Native β
XML xml Native
HTTP http


Approach... How executable nodes (e.g. `CodeChunk` and `CodeExpr` nodes) are represented
  • Native: the format natively supports executable nodes
  • Extens.: executable nodes are supported via extensions to the format
  • rPNG: executable nodes are supported via reproducible PNG images
  • Formula: executable CodeExpr nodes are represented using formulae
  • ✗: Not yet implemented
  • ω: Work in progress
  • α: Alpha, initial implementation
  • β: Beta, ready for user testing
  • : Ready for production use
Issues... Link to open issues and PRs for the format (please check there before submitting a new issue 🙏)

If you'd like to see a converter for your favorite format, look at the listed issues and comment under the relevant one. If there is no issue regarding the converter you need, create one.


Several of the codecs in Encoda, deal with fetching content from a particular publisher. For example, to get an eLife article and read it in Markdown:

stencila convert https://elifesciences.org/articles/45187v2 ye-et-al-2019.md

Some of these publisher codecs deal with meta data. e.g.

stencila convert "Watson and Crick 1953" - --from crossref --to yaml
type: Article
title: Genetical Implications of the Structure of Deoxyribonucleic Acid
  - familyNames:
      - WATSON
      - J. D.
    type: Person
  - familyNames:
      - CRICK
      - F. H. C.
    type: Person
datePublished: '1953,5'
  issueNumber: '4361'
    volumeNumber: '171'
      title: Nature
      type: Periodical
    type: PublicationVolume
  type: PublicationIssue
Source Codec Base codec/s Status Issues Coverage
HTTP http Based on Content-Type or extension β
ORCID [orcid] jsonld β [][orcid-issues] ![][orcid-cov]
Article metadata
DOI [doi] csl β [][doi-issues] ![][doi-cov]
Crossref [crossref] jsonld β [][crossref-issues] ![][crossref-cov]
Article content
eLife [elife] jats β [][elife-issues] ![][elife-cov]
PLoS [plos] jats β [][plos-issues] ![][plos-cov]


The easiest way to use Encoda is to install the stencila command line tool. Encoda powers stencila convert, and other commands, in that CLI. However, the version of Encoda in stencila, can lag behind the version in this repo. So if you want the latest functionality, install Encoda as a Node.js package:

npm install @stencila/encoda --global


Encoda is intended to be used primarily as a library for other applications. However, it comes with a simple command line script which allows you to use the convert function directly.

Converting files

encoda convert notebook.ipynb notebook.docx

Encoda will determine the input and output formats based on the file extensions. You can override these using the --from and --to options. e.g.

encoda convert notebook.ipynb notebook.xml --to jats

You can also convert to more than one file / format (in this case the --to argument only applies to the first output file) e.g.

encoda convert report.docx report.Rmd report.html report.jats

Converting folders

You can decode an entire directory into a Collection. Encoda will traverse the directory, including subdirectories, decoding each file matching your glob pattern. You can then encode the Collection using the dir codec into a tree of HTML files e.g.

encoda convert myproject myproject-published --to dir --pattern '**/*.{rmd, csv}'

Converting command line input

You can also read content from the first argument. In that case, you'll need to specifying the --from format e.g.

encoda convert "{type: 'Paragraph', content: ['Hello world!']}" --from json5 paragraph.md

You can send output to the console by using - as the second argument and specifying the --to format e.g.

encoda convert paragraph.md - --to yaml

Creating zip archives

Use the --zip option to create a Zip archive with the outputs of conversion. With --zip=yes a zip archive will always be created. With --zip=maybe, a zip archive will be created if there are more than two output files. This can be useful for formats such as HTML and Markdown, for which images and other media are stored in a sibling folder.

Option Description
--from The format of the input content e.g. --from md
--to The format for the output content e.g. --to html
--theme The theme for the output (only applies to HTML, PDF and RPNG output) e.g. --theme eLife. Either a

Thema theme name or a path/URL to a directory containing a styles.css and a index.js file. | | --standalone | Generate a standalone document, not a fragment (default true) | | --bundle | Bundle all assets (e.g images, CSS and JS) into the document (default false) | | --debug | Print debugging information |

Using with Executa

Encoda exposes the decode and encode methods of the Executa API. Register Encoda so that it can be discovered by other executors on your machine,

npm run register

You can then use Encoda as a plugin for Executa that provides additional format conversion capabilities. For example, you can use the query REPL on a Markdown document:

npx executa query CHANGELOG.md --repl

You can then use the REPL to explore the structure of the document and do things like create summary documents from it. For example, lets say from some reason we wanted to create a short JATS XML file with the five most recent releases of this package:

jmp > %format jats
jmp > %dest latest-releases.jats.xml
jmp > {type: 'Article', content: content[? type==`Heading` && depth==`1`] | [1:5]}

Which creates the latest-major-releases.jats.xml file:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.1 20151215//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
                <ext-link ext-link-type="uri" xlink:href="https://github.com/stencila/encoda/compare/v0.79.0...v0.80.0">0.80.0</ext-link> (2019-09-30)

You can query a document in any format supported by Encoda. As another example, lets' fetch a CSV file from Github and get the names of it's columns:

npx executa query https://gist.githubusercontent.com/jncraton/68beb88e6027d9321373/raw/381dcf8c0d4534d420d2488b9c60b1204c9f4363/starwars.csv --repl
🛈 INFO  encoda:http Fetching "https://gist.githubusercontent.com/jncraton/68beb88e6027d9321373/raw/381dcf8c0d4534d420d2488b9c60b1204c9f4363/starwars.csv"
jmp > columns[].name
jmp >

See the %help REPL command for more examples.

Note: If you have executa installed globally, then the npx prefix above is not necessary.


Self-hoisted (documentation converted from various formats to html) and API documentation (generated from source code) is available at: https://stencila.github.io/encoda.


Check how to contribute back to the project. All PRs are most welcome! Thank you!

Clone the repository and install a development environment:

git clone https://github.com/stencila/encoda.git
cd encoda
npm install

You can manually test conversion using the ts-node and the cli.ts script:

npm run cli -- convert simple.md simple.html

There is a bash script to make that a little shorter and more like real life usage:

./encoda convert simple.md simple.html

If that is a bit slow, compile the Typescript to Javascript first and use node directly:

npm run build
node dist/cli convert simple.md simple.html

If you are using VSCode, you can use the Auto Attach feature to attach to the CLI when running the cli:debug NPM script:

npm run cli:debug -- convert simple.gdoc simple.ipynb


Running tests locally

Run the test suite using:

npm test

Or, run a single test file e.g.

npx jest tests/xlsx.test.ts --watch

To display debug logs during testing set the environment variable DEBUG=1, e.g.

DEBUG=1 npm test

To get coverage statistics:

npm run cover

There's also a Makefile if you prefer to run tasks that way e.g.

make lint cover

Running test in Docker

You can also test this package using with a Docker container:

npm run test:docker

Writing tests

Recording and using network fixtures

As far as possible, tests should be able to run with no network access. We use Nock Back to record and play back network requests and responses. Use the nockRecord helper function for this with the convention of starting the fixture file with nock-record- e.g.

const stopRecording = await nockRecord('nock-record-<name-of-test>.json')
// Do some things that connect to the interwebs

Note that the util/http module has caching so that you may need to remove the cache for the recording of fixtures to work e.g. rm -rf ~/.config/stencila/encoda/cache/

As part of continuous integration, tests are run with both NOCK_MODE=wild (which ignores recording) to test that tests still work when using real network connections, and NOCK_MODE=dryrun (for speed and consistency with the default).


We 💕 contributions! All contributions: ideas 🤔, examples 💡, bug reports 🐛, documentation 📖, code 💻, questions 💬. See CONTRIBUTING.md for more on where to start. You can also provide your feedback on the Community Forum and Gitter channel.


Aleksandra Pawlik

💻 📖 🐛

Nokome Bentley

💻 📖 🐛


📖 🎨

Hamish Mackenzie

💻 📖

Alex Ketch

💻 📖 🎨

Ben Shaw

💻 🐛

Phil Neff


Raniere Silva


Lorenzo Cangiano



🐛 🎨

Giorgio Sironi

Add a contributor...

To add youself, or someone else, to the above list, either,

  1. Ask the @all-contributors bot to do it for you by commenting on an issue or PR like this:

    @all-contributors please add @octocat for bugs, tests and code

  2. Use the all-contributors CLI to do it yourself:

    npx all-contributors add octocat bugs, tests, code

See the list of contribution types.


Encoda relies on many awesome opens source tools (see package.json for the complete list). We are grateful to their developers and contributors for all their time and energy. In particular, these tools do a lot of the heavy lifting 💪 under the hood.

Tool Use
Ajv Ajv is "the fastest JSON Schema validator for Node.js and browser". Ajv is not only fast, it also has an impressive breadth of functionality. We use Ajv for the validate() and coerce() functions to ensure that ingested data is valid against the Stencila schema.
Citation.js Citation.js converts bibliographic formats like BibTeX, BibJSON, DOI, and Wikidata to CSL-JSON. We use it to power the codecs for those formats and APIs.
Frictionless Data datapackage-js from the team at Frictionless Data is a Javascript library for working with Data Packages. It does a lot of the work in converting between Tabular Data Packages and Stencila Datatables.
Glitch Digital Glitch Digital's structured-data-testing-tool is a library and command line tool to help inspect and test for Structured Data. We use it to check that the HTML generated by Encoda can be read by bots 🤖
Pa11y Pa11y provides a range of free and open source tools to help designers and developers make their web pages more accessible. We use pa11y to test that HTML generated produced by Encoda meets the Web Content Accessibility Guidelines (WCAG) and Axe rule set.
Pandoc Pandoc is a "universal document converter". It's able to convert between an impressive number of formats for textual documents. Our Typescript definitions for Pandoc's AST allow us to leverage this functionality from within Node.js while maintaining type safety. Pandoc powers our converters for Word, JATS and Latex. We have contributed to Pandoc, including developing its JATS reader.
Puppeteer Puppeteer is a Node library which provides a high-level API to control Chrome. We use it to take screenshots of HTML snippets as part of generating rPNGs and we plan to use it for generating PDFs.
Remark Remark is an ecosystem of plugins for processing Markdown. It's part of the unified framework for processing text with syntax trees - a similar approach to Pandoc but in Javascript. We use Remark as our Markdown parser because of it's extensibility.
SheetJs SheetJs is a Javascript library for parsing and writing various spreadsheet formats. We use their community edition to power converters for CSV, Excel, and Open Document Spreadsheet formats. They also have a pro version if you need extra support and functionality.

Many thanks to the Alfred P. Sloan Foundation and eLife for funding development of this tool.




npm i @stencila/encoda

DownloadsWeekly Downloads






Unpacked Size

1.27 MB

Total Files


Last publish


  • avatar
  • avatar
  • avatar