Hale - structured health-checks
Hale collects health-check information from the different parts of your apps.
Registering the hale plugin
pack;
Options
- path: optional, the path to use for the health-check route, defaults to '/healthcheck'.
- routeConfig: optional, object that will be used as config for the health-check route.
- exposeOn: optional, only register the health-check route on servers with the specified labels, registers on all servers in the pack by default.
- exposePublicOn: optional, register a simple health-check route on the servers with the specified labels, no public healthcheck is registered by default.
- publicPath: optional, the path to use for the public health-check route, defaults to '/healthcheck'.
- metadata: optional, information that will be merged with the health-check result object. Will not overwrite existing attributes of the result object.
Registering a health-check
The hale plugin exposes an addCheck
function that is used to register health-checks. A health-check is an object with the following options:
- name: the name of the healthcheck.
- description: a description of the healthcheck.
- tags: optional, tags to set for the healthcheck.
- timeout: optional, timeout in milliseconds for the healtcheck, defaults to 2000.
- handler: function(collector, done) the function that performs the check.
The collector object
The collector object exposes functions for logging events, timing operations, and capturing context data.
collector.info(message, [data])
: Log an info event.collector.notice(message, [data])
: Log a notice event.collector.warning(message, [data])
: Log a warning event.collector.failure(message, [data])
: Log a failure event.collector.mark(name)
: Start a timer that can be used to mark checkpoints. Returns afunction([label])
that can be used to add a mark with a label.collector.context(name, data)
: Add context data to the check.
plugin;
Result
The status of each individual check will be the that of the "worst" logged event. Log item statuses map to overall health status like this:
- info: OK
- notice: OK
- warning: WARN
- failure: FAIL
Likewise the status of the overall health-check will be that of the worst individual check.
The top level time
attributes is a Unix timestamp representing the time the helthcheck was performed. checks[*].time
is the elapsed time for the individual health-check in microseconds. In checks[*].context.times[*]
the start
attribute is the elapsed time since the check started and .marks[*].elapsed
is the number of microseconds since the mark timer started.