keen-query

3.2.7 • Public • Published

keen-query

Concise JavaScript API and cli for querying and performing advanced analysis on Keen data.

Usage

Running queries

cli

npm install -g Financial-Times/keen-query

Make sure you have KEEN_READ_KEY and KEEN_PROJECT_ID env vars set

Note: if you work in the next FT team you need n-keen-query (identical syntax to this component, but wrapped in some sensible default settings)

  • kq 'page:dwell->count()->filter(!user.isStaff)' will output an ASCII table of data
  • kq convert 'https://... some long Keen url' can be used to convert existing queries to the format below
  • kq print 'https://... some long Keen url' can be used to output ASCII tables given a Keen query url

Writing queries

Queries can be built in two main ways

  • a string such as page:dwell->count()->filter(!user.isStaff)->group(page.location.type).relTime(this_12_days)->interval(d).
  • Equivalently, the JavaScript API can be used directly
    new KeenQuery('page:dwell')
        .count()
        .filter('!user.isStaff')
        .group('page.location.type')
        .relTime('this_12_days')
        .interval('d')

Under the hood, keen-query converts the string queries into ones using the API, using KeenQuery.build(), which can be used to write your queries as strings, but consume within a JS application.

Notes on syntax

  • Values in the query strings are heavily type coerced, so quote marks are generally not necessary. The (largely untested) intention is however to be agnostic about quote marks, so if you have a value you don't want to be coerced, or that contains awkwards charcters that may break the query parsing (such as (), try adding quotes
  • If any bit of syntax seems unintuitive please raise it ASAP - would be good to get most things reasonably settled by the v2 release

Setting up the data extraction

First choose your event type

String JS API
Begin string with page:dwell const kq = new KeenQuery('page:dwell')

...then one of the following

Function String JS API
Count all events ->count() kq.count()
Count unique values ->count(user.uuid) kq.count('user.uuid')
Minimum value of prop ->min(session.length) kq.min('session.length')
Maximum value of prop ->max(session.length) kq.max('session.length')
Sum values of prop ->sum(session.length) kq.sum('session.length')
Average value of prop ->avg(session.length) kq.avg('session.length')
Median value of prop ->med(session.length) or ->median(session.length) kq.med('session.length') or kq.median('session.length')
n-th (e.g. 90th) percentile value of prop ->pct(session.length,90) or ->percentile(session.length,90) kq.pct('session.length', 90) or kq.percentile('session.length', 90)
Select unique values for prop ->select(user.uuid) kq.select('user.uuid')

Then any of the below can be applied in any order (though it's advisable to put any time functions last as these will be the ones you'll most likely want to tweak later)

Filtering data

Filter can be called as many times as you like

Note: the intention is to replicate all Keen's available filters. Raise an issue if I missed any

Function String JS API
prop is equal to value ->filter(prop=val) kq.filter('prop=val')
prop is not equal to value ->filter(prop!=val) kq.filter('prop!=val')
prop is greater than value ->filter(prop>val) kq.filter('prop>val')
prop is less than value ->filter(prop<val) kq.filter('prop<val')
prop is greater than or equal to value ->filter(prop>=val) kq.filter('prop>=val')
prop is less than or equal to value ->filter(prop<=val) kq.filter('prop<=val')
prop contains value ->filter(prop~val) kq.filter('prop~val')
prop does not contain value ->filter(prop!~val) kq.filter('prop!~val')
prop is equal to val1, val2 ... ->filter(prop?val1,val2,val3) kq.filter('prop?val1,val2,val3')
prop is not equal to val1, val2 ... ->filter(prop!?val1,val2,val3) kq.filter('prop!?val1,val2,val3')
prop exists ->filter(prop) kq.filter('prop')
prop doesn't exist ->filter(!prop) kq.filter('!prop')

Grouping data by property or time period

Data can be grouped by as many properties as required, as well as by time intervals. For the purposes of outputting as tables/graphs grouping by no more than two things is advisable

Function String JS API
Group data per minute ->interval(m) kq.interval('m')
Group data per hour ->interval(h) kq.interval('h')
Group data per day ->interval(d) kq.interval('d')
Group data per week ->interval(w) kq.interval('w')
Group data per month ->interval(mo) kq.interval('mo')
Group data per year ->interval(y) kq.interval('y')
Group fortnightly ->interval(2_w) kq.interval('2_w')
Group data by value of page.type ->group(page.type) kq.group('page.type')
Group data by multiple properties ->group(page.type,user.isStaff) kq.group('page.type', 'user.isStaff')
Exclude null values ->tidy() kq.tidy()

Instead of shorthands, minute, hour, day, week, month or year can now also be used

Selecting time range

By default data is returned for this_14_days

Relative time

Function String JS API
This 6 days ->relTime(6) or ->relTime('this_6_days') kq.relTime(6) or `kq.relTime('this_6_days')
This 8 weeks ->relTime(8_weeks) or ->relTime('this_8_weeks') kq.relTime('8_weeks') or `kq.relTime('this_8_weeks')
Previous 3 hours ->relTime(previous_3_hours) `kq.relTime('previous_3_hours')

Absolute time

start and end should be ISO time strings (Date objects are also OK in the JS API). Support for other time formats is on the backlog!

Function String JS API
From start to end times ->absTime(start,end) kq.absTime(start, end)

Post processing data

A number of additional methods can be used to aggregate, reduce, or otherwise manipulate the results of a Keen query. They can be combined in all sorts of weird and wonderful ways (e.g. calculate a ratio of two tables, reduce to a single column, then concatenate with the original values) - be careful you're not generating nonsense data!

Note on specifying dimensions: Some methods expect a dimension to be specified e.g to choose between taking an average across rows or columns. The value of dimension can be - timeframe or the name of a property that has been grouped by - a positive integer (0 indexed) to refer dirctly to a given dimension e.g in a table plotting count against time, a value of 1 would pick out the time dimension. Dimensions are added in the same order the methods creating them are called so e.g. ->interval(d)->group(uuid) would have timeframe as the 0th dimension, uuid as the 1st;

Aggregators

These combine multiple keen-queries using a predefined rule. They follow the syntax @agregatorName(comma separated list of queries). So far they are not available In the JS API, and include:

  • @ratio - Given two queries returning results with similar structure, it returns a new table where the values are the result of dividing the value in the first table with its corresponding value in the second
  • @pct - as above but expressed as a percentage
  • @sum - Given two or more queries returning results with identical structure, it returns a new table where the values are the result of adding the values for multiple tables
  • @subtract - Given two queries returning results with identical structure, it returns a new table where the values are the result of subtracting the value on the second table from the first
  • @concat - Given n queries returning results with similar structure, it combines them into a single table by concatenating the columns of each table
  • @funnel - Track whether e.g. a user completes several steps of a funnel. Must be used with the special extraction ->with() to choose the property to use to identify the user. // TODO hack together something to make funnels work with grouped data or intervals

Aggregations must be created using KeenQuery.build(), which returns an object with the same interface as a KeenQuery instance, so reductions can be performed on it.

Reductions

These allow values to be combined according to well known mathematical functions

Function String JS API
Average of values ->reduce(timeframe,avg) kq.reduce('timeframe', 'avg')
Sum of values ->reduce(timeframe,sum) kq.reduce('timeframe', 'sum')
Minimum value ->reduce(timeframe,min) kq.reduce('timeframe', 'min')
Maximum value ->reduce(timeframe,max) kq.reduce('timeframe', 'max')
Median value ->reduce(timeframe,median) kq.reduce('timeframe', 'median')
Trend (linear regression gradient) ->reduce(timeframe,trend) kq.reduce('timeframe', 'trend')
Percent change - % up/down in last 2 values ->reduce(timeframe,%change) kq.reduce('timeframe', '%change')

If a third paramter is set to true a table will be returned that concatenates the reduction on as an additional column

Other

  • ->round(n) Rounds values to n decimal places. if n is negative rounds to the nearest 10, 100, 1000 etc...
  • sort(dimension, value) TODO (please request)
  • multiply(n) Multiplies each value by n
  • divide(n) Divides each value by n
  • sortAsc() (1 dimensional tables only)
  • sortDesc() (1 dimensional tables only)
  • reorder(property,value1,value2,...) Sorts rows in the result according to values in the property axis, in the order given
  • plotThreshold(value, name) For graphs over time, draws an additional line fixed at the given value
  • relabel(property,value1,value2,...) relabels the data labels in the property axis (unwise to use this unless e.g using @concat on a preditable set of values, or if using ->reorder() first)

Experimental

  • top(n)/ bottom(n) shows the top/bottom n (or n percent if the last character is '%') of results
  • cutoff(n) ignore all values smaller than n (or n percent if the last character is '%')
  • sortAsc(prop,[reduction,dimension])
  • sortDesc(prop,[reduction,dimension])

Outputting data

There are a few built in methods for outputting data

Output String JS API
Returns the url(s) used to query Keen ->print(url) `kq.print('url')
JSON representation of the query ->print(qo) `kq.print('qo')
Stringified JSON representation of the query ->print(qs) `kq.print('qs')
The raw JSON response(s) from Keen ->print(raw) `kq.print('raw')
Flattened matrix representation of the response ->print(matrix) `kq.print('matrix')
Prints out an ASCII table of the results ->print(ascii) `kq.print('ascii')

KeenQuery.definePrinter(name, func) can be used to define your own printers (e.g. to output a graph to the DOM). Within func, this will point at the current KeenQuery instance, and this.getTable() will give access to an object with the following properties and methods:

Note: the intention is for these objects to be immutable. If you find an instance of a method that mutates the original table it's a bug - don't rely on the behaviour and please report

  • data - property holding all the data retrieved in an n-dimensional matrix constructed of arrays nested n deep
  • axes - names and values for axes of the table
  • dimension - property holding the number of dimensions of the table (i.e. by how many things is data grouped by)
  • size - property holding an array representing the size of the table e.g. if grouped by eye.colour and hair.colour and there are 4 possible values for eye colour and 6 for hair colour it will return [4, 6]
  • getAxis (name) - returns the dimension in which a given grouping is held, e.g. in the above example getAxis('eye') would return 0, getAxis('hair') would return 1
  • convertTime (format) - converts all timeframe objects to the given format. Accepted values are
    • ISO - ISO strings
    • shortISO - ISO strings with unnecessary fine-grainedness removed
    • human - human readable strings representing the timeframe
    • shortest - shortest possible human readable strings containing enough information to identify the time range
  • humanize (timeFormat) - converts the table (where possible) to an object of the following format
    {
            headings: ['array', 'of', 'column', 'headings',
            rows: [[], [], []] // rows of data, including row headings in the first position of each sub array
    }
  • cellIterator (func, endDepth) - Iterates a function over each cell in the table TODO - known bug. need to change to being immutable
  • switchDimensions (a, b, method) - switches two dimensions e.g. swaps rows for columns
    • a - index/name of the first dimension to move (default 0)
    • b - index/name of the second dimension to move (default: the deepest dimension of the table)
    • method - when a or b are their default values, setting method to shuffle will move the dimension to be the first/last, and shuffle all other dimesnions along to make room, as opposed to swapping the a/bth dimension with the first/last

the Keen data with all aggregations, reductions etc. already applied.

Utilities

  • KeenQuery.parseFilter(str) - converts a string compatible with the above syntax into a Keen filter object
  • KeenQuery.defineQuery(name, func) - defines a method name which can be used as part of a keen-query string or in the JS API.

Readme

Keywords

none

Package Sidebar

Install

npm i keen-query

Weekly Downloads

2

Version

3.2.7

License

ISC

Last publish

Collaborators

  • robertboulton
  • seraph2000
  • hamza.samih
  • notlee
  • emmalewis
  • aendra
  • the-ft
  • rowanmanning
  • chee
  • alexwilson