pg-rdf-to-json
TypeScript icon, indicating that this package has built-in type declarations

0.1.0 • Public • Published

pg-rdf-to-json

Tests Badge

Transforms RDF files from rdf-files.tar.bz2 provided by Project Gutenberg into JS/JSON objects.

Usage

There are two main ways to use this library: As a CLI or as a library.

CLI

The CLI accepts three arguments:

--input

  • Short: -i
  • Type: string
  • Default: -

This option defines the location of rdf-files.tar.bz2 file to be read. If it is not provided, or if it is set to -, then the XML files will be read from stdin.

--output

  • Short: -o
  • Type: string

If passed, this value should be a path with defines where the output JSON files should be written to. If not passed, the files will be written to stdout as Record separator-delimited JSON.

--validate

  • Short: -v
  • Type: boolean
  • Default: false

If passed and set to true, this option will ensure that every JSON object outputted conforms to the JSON Type Definition defined in types.ts.

Example CLI Usage

Here's an example of reading from rdf-files.tar.bz2, converting the contained files to JSON, and using jq to output the title of each book as it is converted:

tar -lxOf input-files/rdf-files.tar.bz2 | npx pg-rdf-to-json | jq --seq -r .title

Library

The library exposes two generator functions for converting RDF files to JSON.

booksFromStream

Accepts a Readable stream as its only parameter.

This function expects the passed stream to yield the text of at least one XML file.

This function is an async generator which means it conforms to the async iterator protocol. This means you can read its results using a for-await...of loop like so:

const tar = spawn('tar', ['-lxOf', 'rdf-files.tar.bz2']);

for await (const book of booksFromStream(tar.stdout)) {
  console.log(book.title);
}

booksFromArchive

Accepts a path to a .tar.bz2 file as a string as its only parameter.

[!IMPORTANT] This function spawns an internal instance of tar and has only been tested on Linux.

Like booksFromStream, this function is an async generator which means it conforms to the async iterator protocol. This means you can read its results using a for-await...of loop like so:

for await (const book of booksFromArchive('rdf-files.tar.bz2')) {
  console.log(book.title);
}

Package Sidebar

Install

npm i pg-rdf-to-json

Weekly Downloads

3

Version

0.1.0

License

MIT

Unpacked Size

213 kB

Total Files

73

Last publish

Collaborators

  • chrisn