[Webinar] Finding Bad Apple in your Regular Expressions

Hi all,

We are hosting a webinar on January 11th to address How to find Bad Apple in your Regular Expressions!

Join our Software Engineer Johann Beleites, to discover some common mistakes while writing regular expressions, how Sonar can detect problematic expressions, some limitations of static analysis in this context, and some techniques to improve regular expression performance and defend against malicious inputs.

What: How to find Bad Apple in your Regular Expressions

When: January 11th 10am CST/5pm CET

Who is it intended for: Anyone with basic knowledge & interest in regular expressions

Register now!

Can’t make it to the live but still interested? You can register here to receive the recording.

2 Likes

Hi Camille, I tried to join this webinar using the links but this message was shown:
“You cannot view this recording.
No permission.”

There wasn’t any prompt to enter the passcode I received - is there any way to watch the recording now?

Thanks,
Drew

Hi Drew,

Thank you for your message. You will soon receive a follow-up email at the email address you provided upon registration, in which you will find the webinar recording and all related information.

Thank you,
Camille

Hi all,

Thanks to everyone who registered and attended our most recent webinar. Below you’ll find answers to the questions we received during the presentation.

Q - Can you please give an example of Look-arounds?
A - (?=.*a)(?=.*b)(?=.*c)\w+ would be an example that matches a word containing at least one a, b and c in any order.

Q - So the regex engine does not follow an order for comparisons based on the order we write the regex?
A - Backtracking regex engines will try the alternatives in the order they are written in the regex until they find one that matches. If the match later fails, the engine will backtrack and try the next alternative. If all alternatives fail, it will backtrack further and might start over at the first alternative again if this part of the regex is reached again.

Q - Is it possible to fail regex’s analysis of the project or only search to?
A - If you mean “can you fail the Sonar scan of a project if a regex issue is found?”, then no, the Sonar scanner will find the issue and present it in SonarQube UI for you to confirm the security issue or not.
However, yes, there is way to fail the analysis. You can implement a Quality Gate to fail your Sonar scanner analysis if an excess number of issues is detected. Please see https://docs.sonarqube.org/latest/user-guide/quality-gates/ for more info.

Q - Why isn’t there a quantifier which is both lazy and without backtracking in java regex engine?
A - Lazy quantifiers start by matching the minimum number of repetitions allowed by the quantifier. So for example x*? starts by matching zero xs and x+? starts by matching only a single x. They will only try to match more than that when the subsequent match fails and backtracking happens. So a version of these quantifiers without backtracking would be useless as it could only ever match the minimum, so a non-backtracking version of + would only match a single instance of the pattern and a non-backtracking version of * would match nothing.

Q - How can we access the engine?
A - This regex engine is included in SonarCloud, SonarLint, and SonarQube. All of our products.
A - Sonar’s regex analysis engine is part of SonarQube, SonarCloud, and SonarLint for Java, Kotlin, Python, and PHP code.

Q - Does Sonar have an online analysis site?
A - Did you mean where you can upload code to scan? No, there is no such tool, but you can use SonarCloud, which is the SaaS version of SonarQube. All public projects are free to scan. No limit for public projects to scan.

** Q - For that issue of the regex using grep I would recommend to set ulimit in the box that contains that engine. You think that the regex from xpath can overcome such issues? What can we do about the its limitations?**
A - Yes, it is true that you can use ulimit to prevent system resource overload and thus performance issues. I’m still a bit unclear of what you mean by regex and XPath relationship. Can you please create a thread on Sonar community to help us clarify your question?

Q - Any good resources for learning regex? Pluralsight or books?
A - I recommend learning regex by practicing it directly. I suggest following this Reddit comment: https://www.reddit.com/r/learnprogramming/comments/cduxuu/regex_wizards_how_did_you_learn_regex_and_how_did/etwj6hj/. 1. Use https://regexr.com/. You can paste an expression at the top, mouse over each highlighted section, and it will explain what that part does. It’s a great way to decipher hieroglyphics into something you can start to understand. 2. Complete https://regexone.com/. It’s a great interactive introduction to regular expressions that should fill any gaps in your knowledge. You have to fill in the regex that satisfies the named matches. 3. Complete https://regexcrossword.com/. It’s a “reverse regex” crossword web game where you have to type the string that satisfies the expressions in all of the row and column headers. (See https://regexcrossword.com/howtoplay.)) 4. Look at common regex patterns and try to understand them. Paste some from https://projects.lukehaas.me/regexhub/ into regexr.com and hover over each group in the regex to help figure out anything you’re not familiar with. Start with the shortest examples that you don’t yet understand.

Q - what severity scale does sonaranalyze apply to these issues once detected, are their severities defined that you can share?
A - The default severity for the S5852 catastrophic backtracking rule is “Critical”. See https://rules.sonarsource.com/java/RSPEC-5852 (for Java) and https://rules.sonarsource.com/typescript/RSPEC-5852 (for TypeScript), etc. For more about severity scale, please see the SonarQube docs (Issues) or SonarCloud docs (Issues | SonarCloud Docs).

Q - Do you have a preferred valid email recognition expression? You showed one horrible example, but many safe variations are quite poor.
A - My opinion: there is no perfect one so it depends on your context of how strict you want validation. This article gives a nice summary for Java: https://www.baeldung.com/java-email-validation-regex. I suggest using validation from 3rd-party libraries instead.


References

  1. Outage Postmortem - July 20, 2016. Stack Exchange Network Status. https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016. Accessed 2022-09-29 via https://web.archive.org/web/20220608082538/https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016
  2. J. Graham-Cumming: Details of the Cloudflare outage on July 2, 2019. 2019-07-12, The Cloudflare Blog. https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/. Accessed 2022-09-29
  3. J. Atwood: Regex Performance. 2006-01-12, Coding Horror. Regex Performance Accessed 2022-09-29
  4. O. Yakovskind: Catastrophic Backtracking — The Dark Side of Regular Expressions. 2021-10-16, Medium. Catastrophic Backtracking — The Dark Side of Regular Expressions | by Ohad Yakovskind | BigPanda Engineering | Medium. Accessed 2022-09-29
  5. CVE-2021-33503. Mitre CVE List. CVE - CVE-2021-33503 . Accessed 2022-09-29
  6. CVE-2022-34749. Mitre CVE List. CVE - CVE-2022-34749. Accessed 2022-09-29
  7. CVE-2022-21680. Mitre CVE List. CVE - CVE-2022-21680. Accessed 2022-09-29
  8. J. Kirrage, A. Rathnayake, H. Thielecke (2013). Static Analysis for Regular Expression Denial-of-Service Attacks. In: Lopez, J., Huang, X., Sandhu, R. (eds) Network and System Security. NSS 2013. Lecture Notes in Computer Science, vol 7873. Springer, Berlin, Heidelberg. Static Analysis for Regular Expression Denial-of-Service Attacks | SpringerLink
  9. N. Weideman, B. van der Merwe, M. Berglund, B. Watson (2016). Analyzing Matching Time Behavior of Backtracking Regular Expression Matchers by Using Ambiguity of NFA. In: Han, YS., Salomaa, K. (eds) Implementation and Application of Automata. CIAA 2016. Lecture Notes in Computer Science(), vol 9705. Springer, Cham. Analyzing Matching Time Behavior of Backtracking Regular Expression Matchers by Using Ambiguity of NFA | SpringerLink
    P. Warren: Mail::RFC822::Address: regexp-based address validation. 2012-09-17. Mail::RFC822::Address. Accessed 2022-09-29
2 Likes

Hi Camille, I’m still getting the same message when following the link in the email I received.

I see the web form, fill it out and press Register, but still get:
“You cannot view this recording.
No permission.”

I think either the webinar registration has been set to Manual approval and I haven’t been approved, or Automatic approval and there is a problem with Zoom?

See https://support.zoom.us/hc/en-us/articles/204619915

Hi Drew,

Apologies for that issue with Zoom. We’ve have implemented this page for you to access the recording of our webinar.

Best,
Camille

Excellent, that is great - thank you!

1 Like