Drupal JSON:API Extractor
This package is a Drupal json:api client library with one primary responsibility - to crawl through a Drupal produced json:api and save the resulting data to static json files in directory structures to allow easy access to the files.
Why all the trouble? For Drupal sites with only hundreds or low thousands of pages (the majority) enabling the (now core) json:api module in conjunction with this library allows for fully static front ends. Having a way to export all of a site's data to static json files allows those files to be deployed, statically, along with a site's decoupled front end.
It also presents an opportunity to transform the standard json:api output to something a little more friendly for developers to work with. Ideally this library is used during the static generation process.
Getting started
Crawling all drupal nodes of a given content type with each node's associated relationships (including paragraphs) is pretty easy.
const Spider = const baseURL = 'https://example.org/jsonapi/'const spider = baseURL spider// or to crawl every published nodespider
While the above Spider
does crawl through an entire set of content types it does
not actually do anything with the results. This is where we introduce the
Extractor
object.
const Spider Extractor = const baseURL = 'https://example.org/jsonapi/'const spider = baseURL const extractor = spider location: './downloads' extractor
Note: The extractor has a helpful utility function
wipe()
which will returns aPromise
and ensures the target directory is completely empty before resolving.
The above code will output a new downloads
directory with the structure:
downloads/
_resources/
node/
blog/
0ef56bbd-b2d6-475e-8b83-e1fa9bc1e7fb.json
paragraph/
hero/
425a6dc1-5158-4f12-8d54-eb8a7af369f0.json
taxonomy_term/
tags/
2d850e4b-9d2f-4b8f-b1e7-ad959de8b393.json
_slugs/
node/
1.json
blogs/
my-first-blog-post.json
This structure is intended to serve static sites well by allowing lookup by
the unique json:api global unique id, as well as the more traditional drupal
path (node/1
) and a node's alias "slug" (/blogs/my-first-blog-post
).
The extractor by default saves the exact output of the json:api. However, when developing your decoupled front end you may prefer a slightly less verbose json schema. This package includes a transformer that allows easily "cleaning" of the output:
const extractor = spider location: './downloads' clean: true
Sometimes it is nice to see the progress of the download process. This package includes a console logger as well.
const Spider Extractor Logger = const baseURL = 'https://example.org/jsonapi/'const spider = baseURL const extractor = spider location: './downloads' const logger = spider extractor spider
The logger in our example would print to the command line:
✔️ node: 1✔️ taxonomy_term: 1✔️ paragraph: 1----------------------------🎉 Crawl complete!Errors.................0node...................1paragraph..............1taxonomy_term..........1
Configuration options
Each of the provided classes have a number of configuration options.
Spider
You pass options as the first argument when instantiating a new Spider
.
options
// (required) Should include the /jsonapi/ segment baseURL: 'https://example.org/jsonapi/' // (optional) Instance of axios with baseURL already applied api: axios // Quite the program on a crawl error terminateOnError: false // What is the maximum number of concurrent api requests the spider can open. // you get timeout errors from the api, reduce this number. maxConcurrent: 5 // (optional) Resource class configuration options resourceConfig: // (optional) Array of regex that is used to determine which relationships should be crawled relationships: // By default, only relationships that start with field_ are crawled /^field_/
Extractor
You pass options as the second argument when instantiating a new Extractor
.
const extractor = spider optionsextractor// To limit the depth of a crawl, pass a max depth (rarely needed since the// package handles recursive references)extractor
Note: above we use a helpful utility method
wipe()
which will returns aPromise
and ensures the target directory is completely empty before resolving.
// The location to save files (will create directories automatically) location: './' // Should the data be transformed or "cleaned" before being saved to disk? clean: false // Sometimes it's helpful to see pretty-printed json, just flip this to true. pretty: false // The function to pass each Resource through before saving it if clean is true // By default we use our own transform function, this function takes a number of // options itself, or you can choose to use your own callback altogether. transformer:
Internally this library represents every crawled response with a Resource
object. If you choose to override the transformer
callback it will be given
a Resource
as an argument. You can read the source code for details on it's
functionality. If you want change the configuration options
of our transformer, you can customize it:
const Spider Extractor transformer = const baseURL = 'https://example.org/jsonapi/'const spider = baseURL const extractor = spider location: './downloads' clean: true transformer: spider
Logger
The logger, at the moment, is pretty simple with just one configuration option:
...emitters // Set the verbosity of the logger: // 0 - Log nothing // 1 - (default) Show a simple tally of number of downloads and number of errors // 2 - Log each entity and error as its downloading // 3 - Log every event being listened to by the logger verbosity: 1
To do
Currently there is effectively no test coverage, although test files for the classes have been written with an instantiation check in each.