@ndcode/build_cache

0.1.0 • Public • Published

Build Cache system

An NDCODE project.

Overview

The build_cache package exports a single constructor BuildCache(diag) which must be called with the new operator. The resulting cache object is intended to store objects of arbitrary JavaScript type, which are generated from source files of some kind. The cache tracks the source files of each object, and makes sure the objects are rebuilt as required if the source files change on disk.

Calling API

Suppose one has a BuildCache instance named bc. It behaves somewhat like an ES6 Map object, except that it only has the bc.get() function, because new objects are added to the cache by attempting to get them.

The interface for the BuildCache-provided instance function bc.get() is:

await bc.get(key, build_func) — retrieves the object stored under key, the key is a somewhat arbitrary string that usually corresponds to the on-disk path to the main source file of the wanted object. If key already exists in the cache, then the corresponding object is returned after an up-to-date check. Otherwise, the user-provided build_func is called, to build it from sources.

The interface for the user-provided callback function build_func() is:

await build_func(result) — user must set result.value to the object to be stored in the cache, and may set result.deps to a list of pathnames corresponding to the build dependencies, including the main file which is normally the same as the key argument to bc.get(). The result.deps field is set to [key] before the callback and doesn't need to be changed in the common case that exactly one source file compiles to one object in the cache.

About dependencies

Usually the dependencies will be tracked during the building of an object, for instance the main source file may contain include directives that bring in further source files. Most compilers should support this feature, for instance the programmatic API to the Less CSS compiler returns CSS code and a list of dependencies. They are placed in result.value and result.deps respectively.

At the moment we do not support optional source files, for instance there might be a picture image.jpg and an optional metadata file image.json that goes with it. To handle this case, in the future we could add a not exists dependency type so that a rebuild occurs if the metadata file appears later on.

Usage example

Here is a fairly self-contained example of how we can define a function called get_less(dirname, pathname) which will load a Less stylesheet from the given pathname, and compile it to CSS via the cache. The dirname is given so that if the Less stylesheet includes further stylesheets they will be taken from the correct path, usually the path containing the main Less stylesheet.

let BuildCache = require('build_cache')
let fs = require('fs')
let less = require('less/lib/less-node')
let util = require('util')

let fs_readFile = util.promisify(fs.readFile)

let build_cache_less = new BuildCache()
let get_less = (dirname, pathname) => {
  pathname = dirname + pathname
  return /*await*/ build_cache_less.get(
    pathname,
    async result => {
      let text = await fs_readFile(pathname, {encoding: 'utf-8'})
      console.log('getting', pathname, 'as less')
      let render = await less.render(
        text,
        {
          filename: pathname,
          pathnames: [dirname],
          rootpathname: this.root
        }
      )
      result.deps.concat(render.imports)
      result.value = Buffer.from(render.css)
    }
  )
}

The statement pathname = dirname + pathname is simplified for the example, it should properly be something like pathname = path.resolve(dirname, pathname) which would be more likely to create a unique key for the stylesheet, since it would resolve out components like . or .. to give a canonical pathname.

We are relying on fs_readFile() or less.render() to throw exceptions if the original stylesheet or any included stylesheet is not found or contains errors.

Also, note how much simplified the handling of asynchronicity is when using the ES6 async/await syntax. We recommend to do this for all new code, and to do it consistently, even if the use of Promise directly might give shorter code.

Code which is not already promisified, can be promisified as shown, or else we can add specific conversions in places by code like: await new Promise(...). Note the comment /* await */ where a Promise is passed through from a lower level routine, an explicit await would be consistent here but less efficient.

About repeatable builds

Note that the user has to provide a consistent build_func every time the cache is accessed. The build_func may access variables from the surrounding environment, most notably the key value since this is not passed into the build_func from the bc.get() call, but this should only be done in a safe way, such as accessing rarely-changed configuration information, since if the building process is impacted by the environment, there is no point caching it.

If different types of objects need to be cached, use several instances of BuildCache so that differently-built objects do not conflict with each other. We might in future change the API so that the build_func is provided at construction time rather than access time. Although this might be inconvenient from the viewpoint of the callback not being able to access variables from the surrounding context of the bc.get() call, it would ensure repeatable builds.

About asynchronicity

To avoid the overhead of directory watching, the current implementation just does an fs.stat() on each source file prior to returning an object from the cache. This means that bc.get() is fundamentally an asynchronous operation and therefore returns a Promise, which we showed as await bc.get() above.

Also, the building process may be asynchronous, and so build_func() is also expected to return a Promise. Obviously, bc.get() must wait for the build_func() promise to resolve, indicating that the wanted object is safely stored in the cache, so that it can resolve the bc.get() promise with the result.value that is now associated with the key and wanted by the caller.

There are some rather tricky corner cases associated with this, such as what happens when the same object is requested again while its up-to-dateness is being checked or while it is being built. BuildCache correctly handles these cases. Whilst in general the up-to-date check happens every time an object is retrieved, it won't be overlapped with another up-to-date check or a build.

About exceptions

Exceptions during the build process are handled by reflecting them through both Promises, and also invalidating the associated key on the way through, so that the object is no longer cached, and a fresh rebuild will be attempted should it be accessed again in the future. A build failure is the only way that an object can be removed from the cache (we may add an explicit removal API).

Note that if several callers are requesting the same key simultaneously and an exception occurs during the build or up-to-date check, each caller receives a reference to same shared exception object, thus when the bc.get() Promise rejects, the rejection value (exception object) should be treated as read-only.

About deletions

Another corner case happens if source files have been deleted since building, we handle this the same as an updated source file and attempt to rebuild it.

Note that deleting the source files does not remove an object from the cache, since the deleted source files will only be noticed when the object is accessed (however, the resulting rebuild will remove the cached object if it fails).

About diagnostics

The diag argument to the constructor is a bool, which if true causes messages to be printed via console.log() for all activities except for the common case of retrieval when the object is already up-to-date. A diag value of undefined is treated as false, thus it can be omitted in the usual case.

The diag output is handy for development, and can also be handy in production, e.g. our production server is started by systemd which automatically routes stdout output to the system log, and the cache access diagnostic acts somewhat like an HTTP server's access.log, albeit up-to-date accesses are not logged.

We have not attempted to provide comprehensive logging facilities or log-routing, because the simple expedient is to turn off the built-in diagnostics in complex cases and just do your own. In our server we have the built-in diagnostics enabled in some simple cases and disabled in favour of caller-provided logging in others (we use quite a few BuildCache instances, since there are various preprocessors, including Less as mentioned above).

To be implemented

It is intended that we will shortly add a timer function (or possibly just a function that the user should call periodically) to flush built objects from the cache after a stale time, on the assumption that the object might not be accessible or wanted anymore. For example, if the objects are HTML pages, the link structure of the site may have changed to make some pages inaccessible.

GIT repository

The development version can be cloned, downloaded, or browsed with gitweb at: https://git.ndcode.org/public/build_cache.git

License

All of our NPM packages are MIT licensed, please see LICENSE in the repository.

Contributions

We would greatly welcome your feedback and contributions. The build_cache is under active development (and is part of a larger project that is also under development) and thus the API is considered tentative and subject to change. If this is undesirable, you could possibly pin the version in your package.json.

Contact: Nick Downing nick@ndcode.org

Package Sidebar

Install

npm i @ndcode/build_cache

Weekly Downloads

0

Version

0.1.0

License

MIT

Unpacked Size

14.6 kB

Total Files

4

Last publish

Collaborators

  • nick_d2