Build Cache system
An NDCODE project.
Overview
The build_cache
package exports a single constructor BuildCache(diag)
which
must be called with the new
operator. The resulting cache object is intended
to store objects of arbitrary JavaScript type, which are generated from source
files of some kind. The cache tracks the source files of each object, and makes
sure the objects are rebuilt as required if the source files change on disk.
Calling API
Suppose one has a BuildCache
instance named bc
. It behaves somewhat like an
ES6 Map
object, except that it only has the bc.get()
function, because new
objects are added to the cache by attempting to get
them.
The interface for the BuildCache
-provided instance function bc.get()
is:
await bc.get(key, build_func)
— retrieves the object stored under key
,
the key
is a somewhat arbitrary string that usually corresponds to the on-disk
path to the main source file of the wanted object. If key
already exists in
the cache, then the corresponding object is returned after an up-to-date check.
Otherwise, the user-provided build_func
is called, to build it from sources.
The interface for the user-provided callback function build_func()
is:
await build_func(result)
— user must set result.value
to the object to
be stored in the cache, and may set result.deps
to a list of pathnames
corresponding to the build dependencies, including the main file which is
normally the same as the key
argument to bc.get()
. The result.deps
field
is set to [key]
before the callback and doesn't need to be changed in the
common case that exactly one source file compiles to one object in the cache.
About dependencies
Usually the dependencies will be tracked during the building of an object, for
instance the main source file may contain include
directives that bring in
further source files. Most compilers should support this feature, for instance
the programmatic API to the Less
CSS compiler returns CSS code and a list of
dependencies. They are placed in result.value
and result.deps
respectively.
At the moment we do not support optional source files, for instance there
might be a picture image.jpg
and an optional metadata file image.json
that
goes with it. To handle this case, in the future we could add a not exists
dependency type so that a rebuild occurs if the metadata file appears later on.
Usage example
Here is a fairly self-contained example of how we can define a function called
get_less(dirname, pathname)
which will load a Less
stylesheet from the
given pathname, and compile it to CSS via the cache. The dirname
is given so
that if the Less
stylesheet includes further stylesheets they will be taken
from the correct path, usually the path containing the main Less
stylesheet.
let BuildCache = require('build_cache')
let fs = require('fs')
let less = require('less/lib/less-node')
let util = require('util')
let fs_readFile = util.promisify(fs.readFile)
let build_cache_less = new BuildCache()
let get_less = (dirname, pathname) => {
pathname = dirname + pathname
return /*await*/ build_cache_less.get(
pathname,
async result => {
let text = await fs_readFile(pathname, {encoding: 'utf-8'})
console.log('getting', pathname, 'as less')
let render = await less.render(
text,
{
filename: pathname,
pathnames: [dirname],
rootpathname: this.root
}
)
result.deps.concat(render.imports)
result.value = Buffer.from(render.css)
}
)
}
The statement pathname = dirname + pathname
is simplified for the example, it
should properly be something like pathname = path.resolve(dirname, pathname)
which would be more likely to create a unique key for the stylesheet, since it would resolve out components like .
or ..
to give a canonical pathname.
We are relying on fs_readFile()
or less.render()
to throw exceptions if the
original stylesheet or any included stylesheet is not found or contains errors.
Also, note how much simplified the handling of asynchronicity is when using the
ES6 async
/await
syntax. We recommend to do this for all new code, and to do
it consistently, even if the use of Promise
directly might give shorter code.
Code which is not already promisified, can be promisified as shown, or else we
can add specific conversions in places by code like: await new Promise(...)
.
Note the comment /* await */
where a Promise
is passed through from a lower
level routine, an explicit await
would be consistent here but less efficient.
About repeatable builds
Note that the user has to provide a consistent build_func
every time the cache
is accessed. The build_func
may access variables from the surrounding
environment, most notably the key
value since this is not passed into the
build_func
from the bc.get()
call, but this should only be done in a safe
way, such as accessing rarely-changed configuration information, since if the
building process is impacted by the environment, there is no point caching it.
If different types of objects need to be cached, use several instances of
BuildCache
so that differently-built objects do not conflict with each other.
We might in future change the API so that the build_func
is provided at
construction time rather than access time. Although this might be inconvenient
from the viewpoint of the callback not being able to access variables from the
surrounding context of the bc.get()
call, it would ensure repeatable builds.
About asynchronicity
To avoid the overhead of directory watching, the current implementation just
does an fs.stat()
on each source file prior to returning an object from the
cache. This means that bc.get()
is fundamentally an asynchronous operation and
therefore returns a Promise
, which we showed as await bc.get()
above.
Also, the building process may be asynchronous, and so build_func()
is also
expected to return a Promise
. Obviously, bc.get()
must wait for the
build_func()
promise to resolve, indicating that the wanted object is safely
stored in the cache, so that it can resolve the bc.get()
promise with the
result.value
that is now associated with the key and wanted by the caller.
There are some rather tricky corner cases associated with this, such as what
happens when the same object is requested again while its up-to-dateness is
being checked or while it is being built. BuildCache
correctly handles these
cases. Whilst in general the up-to-date check happens every time an object is
retrieved, it won't be overlapped with another up-to-date check or a build.
About exceptions
Exceptions during the build process are handled by reflecting them through both
Promise
s, and also invalidating the associated key on the way through, so that
the object is no longer cached, and a fresh rebuild will be attempted should it
be accessed again in the future. A build failure is the only way that an object
can be removed from the cache (we may add an explicit removal API).
Note that if several callers are requesting the same key simultaneously and an
exception occurs during the build or up-to-date check, each caller receives a
reference to same shared exception object, thus when the bc.get()
Promise
rejects, the rejection value (exception object) should be treated as read-only.
About deletions
Another corner case happens if source files have been deleted since building, we handle this the same as an updated source file and attempt to rebuild it.
Note that deleting the source files does not remove an object from the cache, since the deleted source files will only be noticed when the object is accessed (however, the resulting rebuild will remove the cached object if it fails).
About diagnostics
The diag
argument to the constructor is a bool
, which if true
causes
messages to be printed via console.log()
for all activities except for the
common case of retrieval when the object is already up-to-date. A diag
value
of undefined
is treated as false
, thus it can be omitted in the usual case.
The diag
output is handy for development, and can also be handy in production,
e.g. our production server is started by systemd
which automatically routes
stdout
output to the system log, and the cache access diagnostic acts somewhat
like an HTTP server's access.log
, albeit up-to-date accesses are not logged.
We have not attempted to provide comprehensive logging facilities or
log-routing, because the simple expedient is to turn off the built-in
diagnostics in complex cases and just do your own. In our server we have the
built-in diagnostics enabled in some simple cases and disabled in favour of
caller-provided logging in others (we use quite a few BuildCache instances,
since there are various preprocessors, including Less
as mentioned above).
To be implemented
It is intended that we will shortly add a timer function (or possibly just a function that the user should call periodically) to flush built objects from the cache after a stale time, on the assumption that the object might not be accessible or wanted anymore. For example, if the objects are HTML pages, the link structure of the site may have changed to make some pages inaccessible.
GIT repository
The development version can be cloned, downloaded, or browsed with gitweb
at:
https://git.ndcode.org/public/build_cache.git
License
All of our NPM packages are MIT licensed, please see LICENSE in the repository.
Contributions
We would greatly welcome your feedback and contributions. The build_cache
is
under active development (and is part of a larger project that is also under
development) and thus the API is considered tentative and subject to change. If
this is undesirable, you could possibly pin the version in your package.json
.
Contact: Nick Downing nick@ndcode.org