Hi all,
Thanks to everyone who registered and attended our most recent webinar. Below you’ll find answers to the questions we received during the presentation.
Q - Can you please give an example of Look-arounds?
A - (?=.*a)(?=.*b)(?=.*c)\w+
would be an example that matches a word containing at least one a, b and c in any order.
Q - So the regex engine does not follow an order for comparisons based on the order we write the regex?
A - Backtracking regex engines will try the alternatives in the order they are written in the regex until they find one that matches. If the match later fails, the engine will backtrack and try the next alternative. If all alternatives fail, it will backtrack further and might start over at the first alternative again if this part of the regex is reached again.
Q - Is it possible to fail regex’s analysis of the project or only search to?
A - If you mean “can you fail the Sonar scan of a project if a regex issue is found?”, then no, the Sonar scanner will find the issue and present it in SonarQube UI for you to confirm the security issue or not.
However, yes, there is way to fail the analysis. You can implement a Quality Gate to fail your Sonar scanner analysis if an excess number of issues is detected. Please see https://docs.sonarqube.org/latest/user-guide/quality-gates/ for more info.
Q - Why isn’t there a quantifier which is both lazy and without backtracking in java regex engine?
A - Lazy quantifiers start by matching the minimum number of repetitions allowed by the quantifier. So for example x*?
starts by matching zero xs and x+?
starts by matching only a single x. They will only try to match more than that when the subsequent match fails and backtracking happens. So a version of these quantifiers without backtracking would be useless as it could only ever match the minimum, so a non-backtracking version of +
would only match a single instance of the pattern and a non-backtracking version of *
would match nothing.
Q - How can we access the engine?
A - This regex engine is included in SonarCloud, SonarLint, and SonarQube. All of our products.
A - Sonar’s regex analysis engine is part of SonarQube, SonarCloud, and SonarLint for Java, Kotlin, Python, and PHP code.
Q - Does Sonar have an online analysis site?
A - Did you mean where you can upload code to scan? No, there is no such tool, but you can use SonarCloud, which is the SaaS version of SonarQube. All public projects are free to scan. No limit for public projects to scan.
** Q - For that issue of the regex using grep I would recommend to set ulimit in the box that contains that engine. You think that the regex from xpath can overcome such issues? What can we do about the its limitations?**
A - Yes, it is true that you can use ulimit
to prevent system resource overload and thus performance issues. I’m still a bit unclear of what you mean by regex and XPath relationship. Can you please create a thread on Sonar community to help us clarify your question?
Q - Any good resources for learning regex? Pluralsight or books?
A - I recommend learning regex by practicing it directly. I suggest following this Reddit comment: https://www.reddit.com/r/learnprogramming/comments/cduxuu/regex_wizards_how_did_you_learn_regex_and_how_did/etwj6hj/. 1. Use https://regexr.com/. You can paste an expression at the top, mouse over each highlighted section, and it will explain what that part does. It’s a great way to decipher hieroglyphics into something you can start to understand. 2. Complete https://regexone.com/. It’s a great interactive introduction to regular expressions that should fill any gaps in your knowledge. You have to fill in the regex that satisfies the named matches. 3. Complete https://regexcrossword.com/. It’s a “reverse regex” crossword web game where you have to type the string that satisfies the expressions in all of the row and column headers. (See https://regexcrossword.com/howtoplay.)) 4. Look at common regex patterns and try to understand them. Paste some from https://projects.lukehaas.me/regexhub/ into regexr.com and hover over each group in the regex to help figure out anything you’re not familiar with. Start with the shortest examples that you don’t yet understand.
Q - what severity scale does sonaranalyze apply to these issues once detected, are their severities defined that you can share?
A - The default severity for the S5852 catastrophic backtracking rule is “Critical”. See https://rules.sonarsource.com/java/RSPEC-5852 (for Java) and https://rules.sonarsource.com/typescript/RSPEC-5852 (for TypeScript), etc. For more about severity scale, please see the SonarQube docs (Issues) or SonarCloud docs (Issues | SonarCloud Docs).
Q - Do you have a preferred valid email recognition expression? You showed one horrible example, but many safe variations are quite poor.
A - My opinion: there is no perfect one so it depends on your context of how strict you want validation. This article gives a nice summary for Java: https://www.baeldung.com/java-email-validation-regex. I suggest using validation from 3rd-party libraries instead.
References
- Outage Postmortem - July 20, 2016. Stack Exchange Network Status. https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016. Accessed 2022-09-29 via https://web.archive.org/web/20220608082538/https://stackstatus.net/post/147710624694/outage-postmortem-july-20-2016
- J. Graham-Cumming: Details of the Cloudflare outage on July 2, 2019. 2019-07-12, The Cloudflare Blog. https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/. Accessed 2022-09-29
- J. Atwood: Regex Performance. 2006-01-12, Coding Horror. Regex Performance Accessed 2022-09-29
- O. Yakovskind: Catastrophic Backtracking — The Dark Side of Regular Expressions. 2021-10-16, Medium. Catastrophic Backtracking — The Dark Side of Regular Expressions | by Ohad Yakovskind | BigPanda Engineering | Medium. Accessed 2022-09-29
- CVE-2021-33503. Mitre CVE List. CVE - CVE-2021-33503 . Accessed 2022-09-29
- CVE-2022-34749. Mitre CVE List. CVE - CVE-2022-34749. Accessed 2022-09-29
- CVE-2022-21680. Mitre CVE List. CVE - CVE-2022-21680. Accessed 2022-09-29
- J. Kirrage, A. Rathnayake, H. Thielecke (2013). Static Analysis for Regular Expression Denial-of-Service Attacks. In: Lopez, J., Huang, X., Sandhu, R. (eds) Network and System Security. NSS 2013. Lecture Notes in Computer Science, vol 7873. Springer, Berlin, Heidelberg. Static Analysis for Regular Expression Denial-of-Service Attacks | SpringerLink
- N. Weideman, B. van der Merwe, M. Berglund, B. Watson (2016). Analyzing Matching Time Behavior of Backtracking Regular Expression Matchers by Using Ambiguity of NFA. In: Han, YS., Salomaa, K. (eds) Implementation and Application of Automata. CIAA 2016. Lecture Notes in Computer Science(), vol 9705. Springer, Cham. Analyzing Matching Time Behavior of Backtracking Regular Expression Matchers by Using Ambiguity of NFA | SpringerLink
P. Warren: Mail::RFC822::Address: regexp-based address validation. 2012-09-17. Mail::RFC822::Address. Accessed 2022-09-29