There’s been discussion about this in the past (e.g. Java duplicated string literals ) but the rule description for the Java version of this rule (I didn’t check others) says there’s a 5-character threshold to avoid lots of FP’s with common short strings.
The number of duplicates can be adjusted as a parameter. It’d be nice if the string length could also be parameterized.
In our case, while catching things like repeated error messages is very useful, we also generate a lot of SQL strings for use in PreparedStatement objects, so we frequently have things like:
cmd = “SELECT “ + colName…
While letting us specify a whitelist of strings like SQL keywords would be nice, I’d settle for a length threshold, as you already HAVE a length threshold, and I’m just asking to make it parameterizable.
Yes, because we’re getting a lot of hits for SQL keywords forming JDBC commands; these keywords aren’t likely to need changing ever, unless we were to do something radical like change to another SQL dialect.
But it seems this would be useful all around (other languages). Out of curiosity, I took a look at the Javascript version ( SonarQube ) and found that rule not only has a different length threshold (10), but also provides a parameter ignoreStrings, which is what I’d really prefer for Java, especially if regexes are supported. (It’s not clear from the description whether or not they are, but one of the built-in exclusions is a regex.)
Thanks for flagging this; it’s a great point. I’ll take this back to the team for discussion and see if we can bring that parameterization to the Java rule.
It seems to me that for the benefit of users, particularly those whose projects span multiple languages, there should be a goal to make similar rules in different repos (abc:Sxyz and def:Sxyz) as consistent as the differences in the language will allow.
And these requests are handled by language developers. I have no way to route this to a developer of “all languages”. The Java developer who eventually picks this up will presumably (hopefully) notice that the rule exists for other languages as well and reach out to those squads for a parallel change on their side.
OK, I was just assuming that there was some sort of “common” parser, sort of a base/super class that each language then extends. (Of course, that would introduce its own complexity…)
To clarify: I’m suggesting ways for users to filter out string literals they WANT to duplicate. First I suggested parameterizing the existing (hardcoded) length limit. Then I noticed Javascript’s version has a parameterizable exclusion list, so porting that to Java would be fine too. Both would be even better.