hale

1.2.0 • Public • Published

Hale - structured health-checks

Hale collects health-check information from the different parts of your apps.

Registering the hale plugin

pack.register({
  plugin: require('hale'),
  options: {
    path: '/healthcheck',
    routeConfig: {},
    exposeOn: ['admin'],
    metadata: {
      name: 'my service name',
      version: '1.0.2',
    },
  },
});

Options

  • path: optional, the path to use for the health-check route, defaults to '/healthcheck'.
  • routeConfig: optional, object that will be used as config for the health-check route.
  • exposeOn: optional, only register the health-check route on servers with the specified labels, registers on all servers in the pack by default.
  • exposePublicOn: optional, register a simple health-check route on the servers with the specified labels, no public healthcheck is registered by default.
  • publicPath: optional, the path to use for the public health-check route, defaults to '/healthcheck'.
  • metadata: optional, information that will be merged with the health-check result object. Will not overwrite existing attributes of the result object.

Registering a health-check

The hale plugin exposes an addCheck function that is used to register health-checks. A health-check is an object with the following options:

  • name: the name of the healthcheck.
  • description: a description of the healthcheck.
  • tags: optional, tags to set for the healthcheck.
  • timeout: optional, timeout in milliseconds for the healtcheck, defaults to 2000.
  • handler: function(collector, done) the function that performs the check.

The collector object

The collector object exposes functions for logging events, timing operations, and capturing context data.

  • collector.info(message, [data]): Log an info event.
  • collector.notice(message, [data]): Log a notice event.
  • collector.warning(message, [data]): Log a warning event.
  • collector.failure(message, [data]): Log a failure event.
  • collector.mark(name): Start a timer that can be used to mark checkpoints. Returns a function([label]) that can be used to add a mark with a label.
  • collector.context(name, data): Add context data to the check.
plugin.dependency('hale', function registerHealthcheck(plugin, next) {
  plugin.plugins.hale.addCheck({
      name: 'randomiser',
      description: 'Random behaviour',
      tags: ['amore', 'erratic'],
      handler: function erraticGuy(collector, done) {
        var rationale = Math.random();
 
        collector.info('Random musings');
 
        if (rationale < 0.25) {
          collector.info('All is good', {mood: 'great'});
        }
        else if (rationale < 0.50) {
          collector.notice('Sooo, what is this?', {mood: 'ambivalent'});
        }
        else if (rationale < 0.75) {
          collector.warning('I don\'t want to!', {mood: 'angry'});
        }
        else {
          collector.failure('Gaaaah!', {mood: 'berserk'});
        }
        done();
      },
    });
 
    plugin.plugins.hale.addCheck({
      name: 'timeouter',
      description: 'Fail with timeout',
      tags: ['amore', 'failure'],
      timeout: 100,
      handler: function (collector) {
        var mark = collector.mark('almost');
        setTimeout(function partial() {
          mark('halfway');
        }, 50);
 
        setTimeout(function partial() {
          mark('Sooo close!');
        }, 75);
      },
    });
 
    plugin.plugins.hale.addCheck({
      name: 'exceptional',
      description: 'Fail with a bang',
      tags: ['amore', 'failure'],
      handler: function () {
        throw new Error('I\'m with stupid!');
      },
    });
 
  next();
});

Result

The status of each individual check will be the that of the "worst" logged event. Log item statuses map to overall health status like this:

  • info: OK
  • notice: OK
  • warning: WARN
  • failure: FAIL

Likewise the status of the overall health-check will be that of the worst individual check.

The top level time attributes is a Unix timestamp representing the time the helthcheck was performed. checks[*].time is the elapsed time for the individual health-check in microseconds. In checks[*].context.times[*] the start attribute is the elapsed time since the check started and .marks[*].elapsed is the number of microseconds since the mark timer started.

{
  "time": 1409751843556,
  "status": "FAIL",
  "checks": [
    {
      "name": "http",
      "description": "Response metrics",
      "status": "OK",
      "tags": [
        "amore"
      ],
      "time": 159,
      "context": {
        "log": []
      }
    },
    {
      "name": "randomiser",
      "description": "Random behaviour",
      "status": "WARN",
      "tags": [
        "amore",
        "erratic"
      ],
      "time": 137,
      "context": {
        "log": [
          {
            "status": "info",
            "message": "Random musings"
          },
          {
            "status": "warning",
            "message": "I don't want to!",
            "data": {
              "mood": "angry"
            }
          }
        ]
      }
    },
    {
      "name": "timeouter",
      "description": "Fail with timeout",
      "status": "FAIL",
      "tags": [
        "amore",
        "failure"
      ],
      "time": 104662,
      "context": {
        "times": [
          {
            "name": "almost",
            "start": 4,
            "marks": [
              {
                "label": "halfway",
                "elapsed": 50178
              },
              {
                "label": "Sooo close!",
                "elapsed": 78729
              }
            ]
          }
        ],
        "log": [
          {
            "status": "failure",
            "message": "Healthcheck timed out",
            "data": {
              "name": "Error",
              "stack": "Error: Healthcheck timed out\n    at abort [as _onTimeout] (./hale/lib/Hale.js:57:18)\n    at Timer.listOnTimeout [as ontimeout] (timers.js:112:15)",
              "message": "Healthcheck timed out"
            }
          }
        ]
      }
    },
    {
      "name": "exceptional",
      "description": "Fail with a bang",
      "status": "FAIL",
      "tags": [
        "amore",
        "failure"
      ],
      "time": 226,
      "context": {
        "log": [
          {
            "status": "failure",
            "message": "I'm with stupid!",
            "data": {
              "name": "Error",
              "stack": "Error: I'm with stupid!\n    at Object.plugin.plugins.hale.addCheck.handler (./healthcheck.js:86:15)\n...",
              "message": "I'm with stupid!"
            }
          }
        ]
      }
    }
  ],
  "name": "core",
  "version": "1.0.0",
  "hostname": "max-normal.local"
}

Readme

Keywords

Package Sidebar

Install

npm i hale

Weekly Downloads

2

Version

1.2.0

License

MIT

Last publish

Collaborators

  • hugowetterberg