Most of the regular expression engines use backtracking
to try all possible execution paths of the regular expression when evaluating
an input, in some cases it can cause performance issues, called catastrophic backtracking
situations. In the worst case, the complexity
of the regular expression is exponential in the size of the input, this means that a small carefully-crafted input (like 20 chars) can trigger
catastrophic backtracking
and cause a denial of service of the application. Super-linear regex complexity can lead to the same impact too
with, in this case, a large carefully-crafted input (thousands chars).
This rule detects regular expression patterns known to have potential performance issues:
Nested quantifiers
which are quantifiers inside a group that is itself repeated by a quantifier (eg: /(a+)+/
). There is a risk if you answered yes to any of those questions.
To avoid catastrophic backtracking
situations, when possible:
{1,5}
instead of
+
for instance. nested quantifiers
to limit the number of way the inner group can be matched by the outer quantifier, for instance this
nested quantifier situation (ba+)+
doesn't cause performance issues, indeed, the inner group can be matched only if it exists exactly
one b
char per repetition of the group. possessive quantifiers
and atomic grouping
. The first regex evaluation will never end in OpenJDK
<= 9 and the second regex evaluation will never end in any versions of
OpenJDK
:
java.util.regex.Pattern.compile("(a+)+").matcher( "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ aaaaaaaaaaaaaaa!").matches(); // Sensitive java.util.regex.Pattern.compile("(h|h|ih(((i|a|c|c|a|i|i|j|b|a|i|b|a|a|j))+h)ahbfhba|c|i)*").matcher( "hchcchicihcchciiicichhcichcihcchiihichiciiiihhcchi"+ "cchhcihchcihiihciichhccciccichcichiihcchcihhicchcciicchcccihiiihhihihihi"+ "chicihhcciccchihhhcchichchciihiicihciihcccciciccicciiiiiiiiicihhhiiiihchccch"+ "chhhhiiihchihcccchhhiiiiiiiicicichicihcciciihichhhhchihciiihhiccccccciciihh"+ "ichiccchhicchicihihccichicciihcichccihhiciccccccccichhhhihihhcchchihih"+ "iihhihihihicichihiiiihhhhihhhchhichiicihhiiiiihchccccchichci").matches(); // Sensitive
Possessive quantifiers do not keep backtracking positions, thus can be used, if possible, to avoid performance issues:
java.util.regex.Pattern.compile("(a+)++").matcher( "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ aaaaaaaaaaaaaaa!").matches(); // Compliant java.util.regex.Pattern.compile("(h|h|ih(((i|a|c|c|a|i|i|j|b|a|i|b|a|a|j))+h)ahbfhba|c|i)*+").matcher( "hchcchicihcchciiicichhcichcihcchiihichiciiiihhcchi"+ "cchhcihchcihiihciichhccciccichcichiihcchcihhicchcciicchcccihiiihhihihihi"+ "chicihhcciccchihhhcchichchciihiicihciihcccciciccicciiiiiiiiicihhhiiiihchccch"+ "chhhhiiihchihcccchhhiiiiiiiicicichicihcciciihichhhhchihciiihhiccccccciciihh"+ "ichiccchhicchicihihccichicciihcichccihhiciccccccccichhhhihihhcchchihih"+ "iihhihihihicichihiiiihhhhihhhchhichiicihhiiiiihchccccchichci").matches(); // Compliant