Write efficient, error-free and safe regular expressions (regex) in Python

Hello Python developers,

After Java, PHP, JavaScript, TypeScript, and recently Kotlin developers, it’s time for you to get a set of rules to write efficient, error-free, and safe regular expressions ( regex ).

Here is the list of the new rules:

Bug Detection Rules:

  • S6002: Regex lookahead assertions should not be contradictory
  • S5856: Regular expressions should be syntactically valid
  • S5996: Regex boundaries should not be used in a way that can never be matched
  • S5868: Unicode Grapheme Clusters should be avoided inside regex character classes
  • S5855: Regex alternatives should not be redundant
  • S5850: Alternatives in regular expressions should be grouped when used with anchors
  • S5842: Regex repetition pattern’s body should not match the empty String

Code Smell Detection Rules

  • S5361: “str.replace” should be preferred to “re.sub”
  • S5869: Character classes in regular expressions should not contain the same character twice
  • S5843: Regular expressions should not be too complicated
  • S6035: Single-character alternations in regular expressions should be replaced with character classes
  • S6019: Reluctant quantifiers in regular expressions should be followed by an expression that can’t match the empty string
  • S5857: Character classes should be preferred over reluctant quantifiers in regular expressions

These features are available on SonarCloud and will be included in SonarQube 9.2 and SonarLint.