Most of the regular expression engines use backtracking
to try all possible execution paths of the regular expression when evaluating
an input, in some cases it can cause performance issues, called catastrophic backtracking
situations. In the worst case, the complexity
of the regular expression is exponential in the size of the input, this means that a small carefully-crafted input (like 20 chars) can trigger
catastrophic backtracking
and cause a denial of service of the application. Super-linear regex complexity can lead to the same impact too
with, in this case, a large carefully-crafted input (thousands chars).
This rule determines the runtime complexity of a regular expression and informs you of the complexity if it is not linear.
Note that, due to improvements to the matching algorithm, some cases of exponential runtime complexity have become impossible when run using Java 9 or later. In such cases, an issue will only be reported if the project's target Java version is 8 or earlier.
There is a risk if you answered yes to any of those questions.
To avoid catastrophic backtracking
situations, when possible:
{1,5}
instead of
+
for instance. nested quantifiers
to limit the number of way the inner group can be matched by the outer quantifier, for instance this
nested quantifier situation (ba+)+
doesn't cause performance issues, indeed, the inner group can be matched only if it exists exactly
one b
char per repetition of the group. a*b*
is not a problem because a*
and b*
match different things and
a*_a*
is not a problem because the repetitions are separated by a '_' and can't match that '_'. However, a*a*
and
.*_.*
have quadratic runtime. possessive quantifiers
and atomic grouping
. .
to exclude separators where applicable. For example the quadratic regex
.*_.*
can be made linear by changing it to [^_]*_.*
The first regex evaluation will never end in OpenJDK
<= 9 and the second regex evaluation will never end in any versions of
OpenJDK
:
java.util.regex.Pattern.compile("(a+)+").matcher( "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaa!").matches(); // Sensitive java.util.regex.Pattern.compile("(h|h|ih(((i|a|c|c|a|i|i|j|b|a|i|b|a|a|j))+h)ahbfhba|c|i)*").matcher( "hchcchicihcchciiicichhcichcihcchiihichiciiiihhcchi"+ "cchhcihchcihiihciichhccciccichcichiihcchcihhicchcciicchcccihiiihhihihihi"+ "chicihhcciccchihhhcchichchciihiicihciihcccciciccicciiiiiiiiicihhhiiiihchccch"+ "chhhhiiihchihcccchhhiiiiiiiicicichicihcciciihichhhhchihciiihhiccccccciciihh"+ "ichiccchhicchicihihccichicciihcichccihhiciccccccccichhhhihihhcchchihih"+ "iihhihihihicichihiiiihhhhihhhchhichiicihhiiiiihchccccchichci").matches(); // Sensitive
Possessive quantifiers do not keep backtracking positions, thus can be used, if possible, to avoid performance issues:
java.util.regex.Pattern.compile("(a+)++").matcher( "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaa!").matches(); // Compliant java.util.regex.Pattern.compile("(h|h|ih(((i|a|c|c|a|i|i|j|b|a|i|b|a|a|j))+h)ahbfhba|c|i)*+").matcher( "hchcchicihcchciiicichhcichcihcchiihichiciiiihhcchi"+ "cchhcihchcihiihciichhccciccichcichiihcchcihhicchcciicchcccihiiihhihihihi"+ "chicihhcciccchihhhcchichchciihiicihciihcccciciccicciiiiiiiiicihhhiiiihchccch"+ "chhhhiiihchihcccchhhiiiiiiiicicichicihcciciihichhhhchihciiihhiccccccciciihh"+ "ichiccchhicchicihihccichicciihcichccihhiciccccccccichhhhihihhcchchihih"+ "iihhihihihicichihiiiihhhhihhhchhichiicihhiiiiihchccccchichci").matches(); // Compliant