Transforms RDF files from rdf-files.tar.bz2 provided by Project Gutenberg into JS/JSON objects.
There are two main ways to use this library: As a CLI or as a library.
The CLI accepts three arguments:
- Short:
-i
- Type:
string
- Default:
-
This option defines the location of rdf-files.tar.bz2
file to be read. If it is not provided, or if it is set to -
, then the XML files will be read from stdin.
- Short:
-o
- Type:
string
If passed, this value should be a path with defines where the output JSON files should be written to. If not passed, the files will be written to stdout as Record separator-delimited JSON.
- Short:
-v
- Type:
boolean
- Default:
false
If passed and set to true
, this option will ensure that every JSON object outputted conforms to the JSON Type Definition defined in types.ts.
Here's an example of reading from rdf-files.tar.bz2
, converting the contained files to JSON, and using jq to output the title of each book as it is converted:
tar -lxOf input-files/rdf-files.tar.bz2 | npx pg-rdf-to-json | jq --seq -r .title
The library exposes two generator functions for converting RDF files to JSON.
Accepts a Readable
stream as its only parameter.
This function expects the passed stream to yield the text of at least one XML file.
This function is an async generator which means it conforms to the async iterator protocol. This means you can read its results using a for-await...of
loop like so:
const tar = spawn('tar', ['-lxOf', 'rdf-files.tar.bz2']);
for await (const book of booksFromStream(tar.stdout)) {
console.log(book.title);
}
Accepts a path to a .tar.bz2
file as a string
as its only parameter.
[!IMPORTANT] This function spawns an internal instance of
tar
and has only been tested on Linux.
Like booksFromStream
, this function is an async generator which means it conforms to the async iterator protocol. This means you can read its results using a for-await...of
loop like so:
for await (const book of booksFromArchive('rdf-files.tar.bz2')) {
console.log(book.title);
}