We have observed that a regular expression used to validate emails is being flagged as a bug by SonarQube. I’ve done some research and from what I can tell, the control characters are non-printable, so it sort of makes sense that they not be present, but I’ve yet to find out why a top ranking website is advocating for this regex: https://emailregex.com/
I do know that email addresses can now contain special characters from different languages, but my experience with this is severely limited. Hoping someone can help shed some light on this. Thank you!
Entries in the ASCII table below code 32 are known as control characters or non-printing characters. As they are not common in JavaScript strings, using these invisible characters in regular expressions is most likely a mistake.
Is your problem really that you think SonarQube should have a special case to detect that a regex is for email validation? If so:
say so.
While I agree that it would be nice, I think the added value is not much (how many people have had the same question as you out of all the people who use this lint rule?), and this kind of special-casing doesn’t really scale well. How should it be decided what and what not to special-case? It could be a big can of worms that could equally be addressed on your end just by using the provided linting escape hatch mechanisms.
I’m trying to figure out what we’re supposed to do with the conflicting information from emailregex’s site and SonarQube. Looking for an answer from someone who has dealt with this issue and who has specific experience with the issue. Thank you.
Colin - Does SonarQube have a way to ignore the rule? We’re also ok with doing that at this point as well.
You can use NOSONAR comments to ignore warnings on a case-by-case basis. I’ve also updated my answer on stackoverflow.com accordingly (along with an edit pointing out the part of the RFC spec that shows that addresses can have these characters). Please mark it as accepted if it answers your question.
Yes, you can mark individual issues as “False positive” or “Won’t fix” in SQ UI, and/or you can remove the rule from your quality profile.
For your original question - I think this is very complicated regex, and perhaps usage of control characters are justified there (tbh I am not really an expert, but I wonder why control characters should be enabled in email, are you able to configure such email address and send/receive emails on it?). However, I think such complex regexes are going to be an exception and rule will work fine and provide value for 99% of other cases.