FPs Security Hotspot S5852 (polynomial runtime regex)

I’m getting hundreds of false positive hotspots, which take a while to make as safe ( have added a comment around the hotspot UI here: Bulk change assignee on security hotspots review - #2 by ganncamp )

Specifically, the “Make sure the regex used here, which is vulnerable to polynomial runtime due to backtracking, cannot lead to denial of service.” warning seems to appear on every regular expression.

Examples of false positives:

  • } else if (!Pattern.matches(".* - .*", client)) {
  • return Arrays.asList(tmp.trim().split("\\s*,\\s*"));
  • Pattern schemaPattern = Pattern.compile("jdbc:mysql://.*/(.*)\\?.*");

As I understand it, polynomial runtime can only occur if it has

  • Grouping with repetition
  • Inside the repeated group:
    • Repetition
    • Alternation with overlapping

( from Regular expression Denial of Service - ReDoS | OWASP Foundation)

So the examples above should be safe; the first two don’t contain any grouping, and in the third, the grouped expression is not repeating.

Could possibly ratchet back the check so that it only fires when one of the ‘evil regex’ conditions are met.

( Community Edition, v 9.6.1 )

Hi Greg,

Thanks for sharing with the community!

I believe your examples are correctly detected as polynomial time regexes by SonarQube. There are more details in the description of rule S5852:

If you have multiple non-possessive repetitions that can match the same contents and are consecutive or are only separated by an optional separator or a separator that can be matched by both of the repetitions, the worst case matching time can be polynomial (O(n^c) where c is the number of problematic repetitions). For example a*b* is not a problem because a* and b* match different things and a*_a* is not a problem because the repetitions are separated by a '_' and can’t match that '_'. However, a*a* and .*_.* have quadratic runtime.

This applies to the first example (quadratic running time in the size of the input) and the third example (cubic in the size of the input). You can probably prevent this issue by using a possessive quantifier.

The case of your second example is also described:

If you’re performing a partial match (such as by using Matcher.find, String.split, String.replaceAll etc.) and the regex is not anchored to the beginning of the string, quadratic runtime is especially hard to avoid because whenever a match fails, the regex engine will try again starting at the next index. This means that any unbounded repetition (even a possessive one), if it’s followed by a pattern that can fail, can cause quadratic runtime on some inputs. For example str.split("\\s*,") will run in quadratic time on strings that consist entirely of spaces (or at least contain large sequences of spaces, not followed by a comma).

Therefore, your second example is quadratic in the size of the input. I believe first replacing sequences of multiple whitespaces with replaceAll("\\s+", " ") (which should run in linear time, because there is no character after the repetition), and then using split("\\s,\\s") instead would fix the problem.

When conditions you cited from OWASP are fullfilled, the time complexity is exponential, not polynomial. This is of course, much worse.

In your case, you can decide to ignore the warnings if the input is not user-controlled or its size is reasonably bounded, as polynomial time regexes need to be run on large inputs before they become problematic.

I hope this helps!

1 Like

In addition to @sylvain.kuchen’s on-point explanations, let me also mention that both StackOverflow and CloudFlare had outages in the past that were caused by regular expressions with polynomial, not exponential, backtracking behavior.

So the backtracking doesn’t have to be exponential to pose a threat of denial of service attacks (or even accidental outages) and that’s why Sonar tries to detect any regex with quadratic or worse backtracking behavior.

Thanks for the quick response and the info above, didn’t realise there were both polynomial and exponential performance characteristics in the regex engine.

Will go back and recheck these regexes and incorporate your suggestions. There’s a linear time matcher called RE2J which looks like it can provide better performance than the built-in Java regex engine, so might look at moving over to that where possible as well.

Cheers,
Greg

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.