Percentile is a frequently-used metrics in cases like analyzing service response time, where we are specifically interested in those edge cases that response in a significant time. In the real world, there are always tremendous data/logs/files to be analyzed which makes it impossible to load the whole data into system to process and calculate percentile in the [traditional way](Percentile - Wikipedia). Instead, we need an algorithm that can calculate percentile on stream data with limited CPU and RAM consuming, but also trade the accuracy and get an approximate result with an acceptable deviation. There are already many algorithm to handle percentile calculation on stream data, e.g. HdrHistogram , GK, CKMS, T-Digest.
With the inspiration of the [Prometheus node client](GitHub - siimon/prom-client: Prometheus client for node.js) I have implemented the percentile library based on T-Digest algorithm which also support window buckets, that in most cases we are not only interested in the percentile of overall data but the most recent ones. In the worst case, T-Digest would have a time complexity at
O(nlogn) while the memory usage is related to the compression which by default is 100, the number of nodes in T-Digest is limited within 20 * compression, giving that one node is occupying approximate 32 byte, in theory the worst case would need 64KB.
- Quantiles on Streams
- Quantiles over Data Streams: Experimental Comparisons, NewAnalyses, and Further Improvements
- https://caorong.github.io/2020/08/03/quartile- algorithm/
How to use
const Percentile = ;// Init single bucket instanceconst p = ;// Init multi bucket instance with parameters:// windowLiveTimeMS - the TTL of data in a window bucket, in milliSecond// bucketsNum - number of window bucketsconst wp = 1000 10;// Push datafor let i = 0; i < 10000; i++p;wp;p;wp;console;console;
How to test
yarn yarn test