de-identify-sql

1.0.4 • Public • Published

De-identify SQL

De-identify data in SQL statements.

De-identify data in SQL example

About

De-identify SQL removes protected data from a .sql file or piped input (e.g. mysqldump) and sends results to stdout or a new .sql file. De-identify SQL replaces protected data / PII / PHI using the following techniques:

  • Replacing data with a fabricated value.
  • Redacting data with a constant value.
  • Generating new data (customizable JavaScript).

De-identify SQL supports:

  • Reading from piped input or a passed-in file.
  • Writing to piped output or creating a file.

Installation

Install globally using npm i -g de-identify-sql

Usage

> de-identify-sql -h
Options:
  -v, --version            Show version number                         [boolean]
  -h, --help               Show help                                   [boolean]
  -o, --outputFile         Output to file instead of stdout             [string]
  -i, --inputFile          Read from file instead of stdin              [string]
  -s, --strategyDirectory  Location of SQL strategy files
                                   [string] [default: {installed-dir}/strategy}]
  -f, --functionsFile      Location of file that contains all custom functions
                           (optional, overwrites .js per table)         [string]

Examples

# Read from piped input, write to piped output ...
> mysqldump my-database | de-identify-sql | gzip > de-id-mysqldump.sql.gz
# Read from piped input, write to .sql ...
> mysqldump my-database | de-identify-sql -o de-id-mysqldump.sql
# Read from .sql, write to piped output ...
> de-identify-sql -i mysqldump.sql | gzip > de-id-mysqldump.sql.gz
# Read from .sql, write to .sql ...
> de-identify-sql -i mysqldump.sql -o de-id-mysqldump.sql
# Read from .sql, write to screen (test de-identification) ...
> de-identify-sql -i mysqldump.sql
# or
> cat mysqldump.sql | de-identify-sql

De-identifying data within SQL

De-identify SQL uses strategy files which are mapped to tables to modify SQL statements. Strategy files should be placed in the /strategy folder (configurable) and named after the table they act on. If the incoming SQL contains USING the format is DATABASE_NAME.TABLE_NAME.json otherwise the format is TABLE_NAME.json. A CREATE TABLE statement must be part of the input file.

Example: /strategy/user.json

{
  "columns": [
    {
      "redactWith": "internet.email",
      "columnKey": "email",
      "tracked": false
    },
    {
      "redactWith": "NAME REMOVED",
      "columnKey": "name",
      "tracked": false
    },
    {
      "redactWith": "generateDatetime",
      "columnKey": "last_visit",
      "tracked": false
    },
    {
      "redactWith": "{{datatype.number({\"min\":18,\"max\":90})}}",
      "columnKey": "age",
      "tracked": false
    }
  ]   
}

The columns array describes how each SQL column should be modified. Omitted columns pass-through without modification.

  • columnKey - (string) The column to be modified by de-identify-sql.

  • redactWith - (string) There are four ways to de-identify data:

    • A faker function - Possible functions : e.g. name.lastName or phone.phoneNumber
    • A faker template - A mustache template of faker methods: e.g. {{name.firstName}} {{name.lastName}}, {{name.jobTitle}} or {{address.streetAddress}} {{address.city}}
    • Custom JavaScript - You can call a function to create a value, these are defined in a .js file that matches the name of the .json file
    • A constant value - Replace with a constant, e.g. VALUE REMOVED
  • tracked - (boolean) Tracking preserves data relationships while de-identifying SQL.

Tracking: Data Relationships

De-identify SQL can preserve relationships within data. If original data repeats, de-identify SQL recognizes and replaces it with the same value it previously used.

This allows the generated SQL to retain its structure while removing protected data. For example, if there are multiple INSERTs which contain the same email address these would become different email addresses. However, if the tracked parameter is true, then every instance will be replaced with the same made-up value.

Package Sidebar

Install

npm i de-identify-sql

Weekly Downloads

1

Version

1.0.4

License

MIT

Unpacked Size

34.6 kB

Total Files

16

Last publish

Collaborators

  • frizzled