openaq-quality-checks

1.1.4 • Public • Published

OpenAQ Quality Checks

OpenAQ Quality Checks is a command line interface for flagging potentially invalid air quality measurements.

Have an OpenAQ data quality concern or experience you would like to share? Please add it to the OpenAQ Community: What is your OpenAQ data quality experience? issue!

Use

Prerequisites

Install

nvm use 8.9.4
npm install openaq-quality-checks -g

Develop

git clone https://github.com/openaq/openaq-quality-checks
cd openaq-quality-checks
nvm use
yarn install
yarn test

Configuration

openaq-quality-checks expects a list of items, either in json or csv.

There are 2 modes of configuration: config file and command line arguments.

1. Config File

The config file configures the flags. It defines which checks should be run, what values should be flagged, and what string to use for each flag.

A set of default flags are configured in config.yml. The default flags are:

  • E flags the value -999
  • N flags negative values
  • R flags repeating values, grouped by coordinates and ordered by date.

This default config file can be overriden using the --config <file.yml> argument, which should point to a yml file which has the following structure:

keyOne: # Arbitrary identifier for the flag, e.g. 'errors'. Useful for merging with the default configuration.
  flag: Any string, e.g. E
  type: One of exact|set|range|repeats
  # Depending on the type, other values may be included. See lib/flagger.js for what can be configured.
keyTwo:
  # ...

This configuration is merged with the default configuration, overriding fields that exist and adding fields that do not exist.

2. Commmand Line Arguments

Command line arguments configure

  • data input (defaults to STDIN)
  • input and output data format (defaults to json)
  • flags to skip (defaults to none), and,
  • which flags should be used to remove data from the output (default none).
$ quality-check --help
Usage: index.js [options]

Options:
  --version         Show version number                                [boolean]
  --infile          Input file. Should be the same format as input-format (which
                    default to json).
  --input-format    Input format, can be csv or json. Defaults to json.
  --ouptput-format  Output format, can be csv or json. Defaults to json.
  --skip            Comma-separated list of flags to skip.
  --remove          Comma-separated list of flags to use in removing data from
                    output.
  --remove-all      Removes all flagged data.
  --config          Config file to override default config.
  -h, --help        Show help                                          [boolean]

Examples:
  index.js --infile foo.json  Flags the contents of a file and writes to stdout.
  cat foo.json | index.js     Flags the contents of stdin and writes to stdout.

copyright 2018

Example Commands

Read and output JSON

Note: Commands below require jq, but jq is just for pretty printing json. If you don't have jq installed, remove the trailing | jq .

cat examples/addis-ababa-20180202.json | quality-check | jq .
# or
quality-check --infile examples/addis-ababa-20180202.json | jq .

Read and output CSV

cat examples/addis-ababa-20180202.csv | quality-check --input-format csv --output-format csv
# or
quality-check --infile examples/addis-ababa-20180202.csv --input-format csv --output-format csv

Override the default configuration

quality-check --infile examples/addis-ababa-20180202.json --config tests/test-config.yml | jq .

Skip the 'N' and 'R' flags

quality-check --infile examples/addis-ababa-20180202.json --skip N,R | jq .

Remove all errors

quality-check --infile examples/addis-ababa-20180202.json --remove E | jq .

Remove all flagged items

quality-check --infile examples/addis-ababa-20180202.json --remove-all | jq .

Using the API call

curl 'https://api.openaq.org/v1/measurements?location=US%20Diplomatic%20Post:%20Addis%20Ababa%20School&date_from=2018-02-02&date_to=2018-02-06&limit=10' | jq '.results' | quality-check | jq .

Using a different data source

The tool was built with OpenAQ in mind but also to be flexible to other data sources. For example, if you want to analyze aggregated world news using reddit's worldnews subreddit, you might want to flag posts from unknown news organizations.

Using a config like the one in examples/worldnews-config.yml, e.g.:

# examples/worldnews-config.yml
unknown_sources:
  flag: UKNOWN_SOURCE
  type: set
  values: ["theguardian.com", "bbc.co.uk", "bloomberg.com", "bbc.com", "reuters.com", "npr.org", "independent.co.uk", "cnn.com"]
  includes: 'false'
  valueField: 'data.domain'

We can flag all unknown news organizations:

echo $(curl -H "User-Agent: laptopterminal" https://www.reddit.com/r/worldnews.json?limit=50) | \
  jq '.data.children' | \
  quality-check --config examples/worldnews-config.yml | \
  jq '.'

Readme

Keywords

none

Package Sidebar

Install

npm i openaq-quality-checks

Weekly Downloads

0

Version

1.1.4

License

MIT

Unpacked Size

301 kB

Total Files

22

Last publish

Collaborators

  • abarciauskas-bgse