Write efficient, error-free and safe regular expressions (regex) in Java

Hello Java developers,

We just released a set of rules to help you write efficient, error-free and safe regular expressions (regex) in Java.

Regular expressions (regex) are sequences of symbols and characters expressing a string or pattern to be searched for within a longer piece of text. Regex is an incredible tool to express conditions that would otherwise often require many lines of code to catch the same pattern.
While using regex is something quite usual for a developer nowadays, it does not make it something easy to handle. It’s even regularly considered as being something “hard” to do by developers. Writing regex is error-prone, can take time, and once written, identifying errors in them can also be extremely difficult.

Here is the full list of the new rules:

Bug Rules:

  • S5840 : Regex patterns and their sub-patterns should not always fail (Critical)
  • S5856 : Regular expressions should be syntactically valid (Critical)
  • S5850 : Alternatives in regular expressions should be grouped when used with anchors (Major)
  • S5866 : Case insensitive Unicode regular expressions should enable the “UNICODE_CASE” flag (Major)
  • S5868 : Unicode Grapheme Clusters should be avoided inside regex character classes (Major)
  • S5842 : Regex repetition pattern’s body should not match the empty String (Minor)

Code Smell Rules:

  • S5846 : Empty lines should not be tested with regex MULTILINE flag (Critical)
  • S5843 : Regular expressions should not be too complicated (Major)
  • S5860 : Names of regular expressions named groups should be used (Major)
  • S5869 : Character classes in regular expressions should not contain the same character twice (Major)
  • S5854 : Regex containing characters subjects to normalization should use the CANON_EQ flag (Major)
  • S5867 : Unicode-aware versions of character classes should be preferred (Minor)
  • S5857 : Regular expressions character classes should be preferred over non-greedy quantifiers (Minor)

Security Hotspot Rule:

The Security Hotspot S4784 (Using regular expressions is security-sensitive) that is very noisy and not really helping developers was deprecated and replaced by:

  • S5852: Using slow regular expressions is security-sensitive

With this new Security Hotspot you can now detect slow regular expressions that can lead to performance problems and deny of service attacks.

For more information, you can check the changelog .

These features are already available on SonarCloud , and will be included in SonarQube 8.5 and SonarLint.

Alex

3 Likes

That is awesome and should be mentioned in the appropriate OWASP pages. I know that most people don’t care about this, but rest assured that I do care a lot and think this is a really awesome feature.

Is the source code available somewhere? If not, would it be possible to do the same for Kotlin and if so, what will the expected release date be?

4 Likes

Hello Richard and welcome to the community! The source code is indeed available (at https://github.com/SonarSource/sonar-java).