Single-character super-linear RegExps
what a name...
A library to find JS RegExp with super-linear worst-case time complexity for attack strings that repeat a single character.
The static analysis method implemented by this library focuses on finding attack string tuples where a single character is repeated. This major limitation allows the library to be fast while also offering decent support for backreferences and assertions.
This library is not intended as a full static analysis to guard against super-linear worst-case time complexity. It is meant to be as a supplementary analysis on top of existing general analysis methods that don't (or don't fully) support advanced regex features.
This library exports only a single function,
analyse, which takes a RegExp literal and returns a list of reports that show the quantifiers causing super-linear worst-case time complexity.
For more information on the exact inputs and outputs of each function, see the full API documentation.
This library is implemented using a very limited static analysis method that can only find attack strings where a single character is repeated. Attack strings are generated from a tuple (x,y,z) such that every string s = xynz (or
x + y.repeat(n) + z for JS folks) takes O(np) or O(2n) many steps to reject, p>1. This analysis method can only find tuples where y is a single character. E.g. the polynomial backtracking in
/^(ab)*(ab)*$/ for (x,y,z) = ("", "ab", "c") cannot be detected by this library because y is not a single character.
However, this limitation allows the static analysis method to be quick and to provide good (but not perfect) support for backreferences and assertions (e.g.
The analysis method primarily searches for polynomial backtracking. Finds of exponential backtracking are only a byproduct. Because of this, not all causes of super-linear worst-case time complexity are found.
This library doesn't actually search for the whole tuple (x,y,z); it only searches for y and assumes that adequate values for x and z can be found. A single-character approximation of the suffix z will be computed and accounted for but false positives are still possible.
There are 3 different types of reports that each indicate a different type of cause for the super-linear worst-case time complexity. All are explained in the documentation of their types.
While most reports show polynomial backtracking, some report exponential backtracking. Exponential backtracking is a lot more dangerous and can easily be exploited for ReDoS attacks.
While other reports may be dismissed, all reports of exponential backtracking must be fixed.
All reports with
exponential: true report exponential backtracking.