sql-fingerprint

    1.0.2 • Public • Published

    sql-fingerprint

    Converts a SQL into a fingerprint, the abstracted form of a query, which makes it possible to classify similar queries.

    Actually, this module is a JavaScript port of pt-fingerprint. Therefore is primarily intended to use for MySQL queries.

    Description

    The fingerprint is the abstracted form of a query. What "abstracting query" is string conversion such that, for instance, replace values to ?, collapse whitespaces and so on.

    Let's say that there are such queries:

    SELECT * FROM t WHERE i = 1 ORDER BY a, b ASC, d DESC, e ASC;
    
    select *         from t       where i = 1
     order by a, b ASC, d DESC, e asc;

    A fingerprint of both of these will be below:

    select * from t where i = ? order by a, b, d desc, e;

    Usage

    CLI

    npm install -g sql-fingerprint
    
    fingerprint --query="your query"

    Module

    npm install sql-fingerprint
    import fingerprint from 'sql-fingerprint';
    
    console.log(fingerprint('SELECT * FROM users WHERE id = 1', false, false));

    BigQuery UDF

    The initial motivation I wrote this was that classifies similar queries stored in my dataset of BigQuery. BigQuery supports user-defined functions written in JavaScript. Therefore the approach of pt-fingerprint, which is generate fingerprint by a set of RegExps, is more fit than other approaches such as parsing SQL, building AST and then interpreting it. RegExps approach seems a little bit harder to maintain but easier to use as UDF because of the relatively simple source code.

    CREATE TEMP FUNCTION fingerprint(sql STRING, matchMD5Checksum BOOL, matchEmbeddedNumbers BOOL)
    RETURNS STRING
    LANGUAGE js AS r"""
    
      // copy and paste fingerprint function here from fingerprint.js
      // ...
    
      return query;
    """;
    
    SELECT
      fingerprint(textPayload, true, true) fp,
      count(*) as num,
      max(query) as raw_query_sample
    FROM
      `your_table`
    WHERE
      DATE(timestamp, "Asia/Tokyo") = "2022-05-22"
    group by
      fp
    order by
      num desc

    See also: Standard SQL user-defined functions  |  BigQuery  |  Google Cloud

    Options

    • matchMD5Checksum
      • Replace md5 string to single ?
    -- original
    SELECT * FROM db.fbc5e685a5d3d45aa1d0347fdb7c4d35_temp where id=1
    -- fingerprint with matchMD5Checksum=true
    select * from db.?_temp where id=?
    -- fingerprint with matchMD5Checksum=false (default)
    select * from db.fbc?_temp where id=?
    • matchEmbeddedNumbers
      • Preserve numbers within words. Useful for the case like that table name contains a number
    -- original
    SELECT * FROM prices.rt_5min WHERE id = 1
    -- fingerprint with matchEmbeddedNumbers=true
    select * from prices.rt_5min where id = ?
    -- fingerprint with matchEmbeddedNumbers=false (default)
    select * from prices.rt_?min where id = ?

    Keywords

    Install

    npm i sql-fingerprint

    DownloadsWeekly Downloads

    25

    Version

    1.0.2

    License

    ISC

    Unpacked Size

    18.4 kB

    Total Files

    7

    Last publish

    Collaborators

    • cou929