sql-fingerprint

1.0.2 • Public • Published

sql-fingerprint

Converts a SQL into a fingerprint, the abstracted form of a query, which makes it possible to classify similar queries.

Actually, this module is a JavaScript port of pt-fingerprint. Therefore is primarily intended to use for MySQL queries.

Description

The fingerprint is the abstracted form of a query. What "abstracting query" is string conversion such that, for instance, replace values to ?, collapse whitespaces and so on.

Let's say that there are such queries:

SELECT * FROM t WHERE i = 1 ORDER BY a, b ASC, d DESC, e ASC;

select *         from t       where i = 1
 order by a, b ASC, d DESC, e asc;

A fingerprint of both of these will be below:

select * from t where i = ? order by a, b, d desc, e;

Usage

CLI

npm install -g sql-fingerprint

fingerprint --query="your query"

Module

npm install sql-fingerprint
import fingerprint from 'sql-fingerprint';

console.log(fingerprint('SELECT * FROM users WHERE id = 1', false, false));

BigQuery UDF

The initial motivation I wrote this was that classifies similar queries stored in my dataset of BigQuery. BigQuery supports user-defined functions written in JavaScript. Therefore the approach of pt-fingerprint, which is generate fingerprint by a set of RegExps, is more fit than other approaches such as parsing SQL, building AST and then interpreting it. RegExps approach seems a little bit harder to maintain but easier to use as UDF because of the relatively simple source code.

CREATE TEMP FUNCTION fingerprint(sql STRING, matchMD5Checksum BOOL, matchEmbeddedNumbers BOOL)
RETURNS STRING
LANGUAGE js AS r"""

  // copy and paste fingerprint function here from fingerprint.js
  // ...

  return query;
""";

SELECT
  fingerprint(textPayload, true, true) fp,
  count(*) as num,
  max(query) as raw_query_sample
FROM
  `your_table`
WHERE
  DATE(timestamp, "Asia/Tokyo") = "2022-05-22"
group by
  fp
order by
  num desc

See also: Standard SQL user-defined functions  |  BigQuery  |  Google Cloud

Options

  • matchMD5Checksum
    • Replace md5 string to single ?
-- original
SELECT * FROM db.fbc5e685a5d3d45aa1d0347fdb7c4d35_temp where id=1
-- fingerprint with matchMD5Checksum=true
select * from db.?_temp where id=?
-- fingerprint with matchMD5Checksum=false (default)
select * from db.fbc?_temp where id=?
  • matchEmbeddedNumbers
    • Preserve numbers within words. Useful for the case like that table name contains a number
-- original
SELECT * FROM prices.rt_5min WHERE id = 1
-- fingerprint with matchEmbeddedNumbers=true
select * from prices.rt_5min where id = ?
-- fingerprint with matchEmbeddedNumbers=false (default)
select * from prices.rt_?min where id = ?

Readme

Keywords

Package Sidebar

Install

npm i sql-fingerprint

Weekly Downloads

0

Version

1.0.2

License

ISC

Unpacked Size

18.4 kB

Total Files

7

Last publish

Collaborators

  • cou929