Don't replace another useful rule with a far less useful one

Rule java:S1850 has been deprecated and the description says to use java:S2589 instead.

This is another example of what I mentioned before (Don't replace a useful rule with a useless one) about replacing pattern-based rules with a high degree of specificity with vague general rules.

In this case, S1850 points to cases where an instanceof operator always returns true or false. The rule is of type bug, I assume on the grounds that code that looks stupid probably wasn’t intended that way, so is probably a bug. (That seems to be the rationale for a lot of the rules of type bug.)

The suggested replacement rule, S2589, points to all boolean expressions whose values are statically predictable. It’s a smell. In our code, a lot of the flagged code is of the form:

if (foo == null || (foo != null && …)

and SQ sees that the second comparison is guaranteed to be true, and the line could be simplified to

if (foo == null || …)

Obviously the dev either didn’t understand boolean short circuiting, or maybe was forced to follow some policy about null checks.

So while yeah, the second form is preferable, it’s pretty clear what the intent was, and is merely “smelly.” Whereas instanceof is probably less clearly understood by all programmers, so a predictable result is more likely based on a mistaken assumption (is more likely a bug).

Hello Mister Pi,

I am not sure I am following you here. The example you provide seems like a perfectly valid case where a report is desired. Which rule is the one you think is problematic? Why do you think so?

The link I included discussed another case of essentially the same thing. What I see is rules that cover very specific patterns being deprecated in favor of more general replacements. While I can see the benefit (fewer rules to list), there is also a benefit to these more specialized rules.

In the other case (see the link), there was a rule that checked for certain patterns that indicated the wrong short-circuit logical operator was being used in some kind of null check. Something like "if (foo != null || foo.bar()… " which is clearly wrong because if foo is null, the first test will be false and it will jump to the second test and attempt to dereference foo. Clearly, && should be used instead of ||. But this rule has been deprecated in favor of a more generic rule about possible null pointer dereferencing.

The problem is, the old rule is close to 100% accurate. Whereas the general null pointer rule is tackling a more difficult (actually, impossible) task, and therefore has a high rate of false positives. (I’m not dissing SQ for not solving a problem which is intrinsically impossible to solve! :slight_smile: ) So it’s a lot more productive to hand devs a list of 10 bugs that are mostly bona fide than to hand them several hundred that are mostly FP, and have them tell me to pound sand after they determine the first 10 or so are FP.

So the rule I brought up here is similar. The deprecated rule determines statically that an instanceof always returns true (or false). The premise behind the rule (which I think is a good one) is that this probably indicates the coder INTENDED the check to be true sometimes and false other times (else why would it be there in the first place), but implemented it wrong. (That’s the reasoning behind a lot of the rules of type bug: this is valid code but WHY DID YOU DO THIS? You must have meant something else.)

The favored replacement rule, first of all, is only a smell, so is more likely to be deprioritized. Also, it catches a lot of cases that have redundant checks. And yes, redundant code is clutter, and qualifies as a smell. But it’s harmless (mostly). Whereas the deprecated rule java:S1850 seems to be pointing out genuine problems.

1 Like

Thanks for the clarification, this helped me understand your argument much better! :slight_smile: I see your point, especially with the effective reclassification of this issue from ‘bug’ to ‘code smell’. I can also understand the argument around merging an accurate and very specific rule with a more general but potentially less accurate one.

Let me trigger some discussion on this topic, I’ll come back to it here as soon as I can report more.

1 Like

Glad I was able to explain my thoughts before someone there got the idea to combine all rules into One Rule to Rule Them All: “Code Shouldn’t Suck”! :slight_smile:

But on a more serious note: the docs explain that Security Hotspots are separated from Vulnerabilities essentially according to an estimated probability of accuracy: the hotspots are more likely to be FP because they’re intrinsically harder to check. Is this based on any actual empirical data, or is it just an educated guess? If you have empirical data about FP rates for each rule then you could use that as a guide: don’t merge rule X into rule Y if p(X is FP) << p(Y is FP).

Of course, even if Y is actually a superset of X (e.g., S2589 is a superset of S1850), one might still want both. But having the same code flagged twice would be annoying. So for the benefit of the user, the scanner should have a policy like “if the code is tagged for X then don’t tag it for Y” – but you can still tag it for Y if X is turned off in the quality profile, for instance.

Hi,

Ehm… that’s not what we meant. If the docs say exactly that, then could you please point me to the page?

A Security Hotspot is a security-sensitive piece of code where there might be a vulnerability depending on context.

I think they actually dropped the Security Hotspot rule about cookie setting, but it’s a great example so I’m going to use it here anyway.

Let’s say I’m setting an unencrypted cookie in your browser. If that cookie contains your current sock color, then it’s no big deal. If it contains PII, or your credit card information… that’s a big deal! Algorithmically, we can’t tell. Human review is required. And that’s what characterizes a Security Hotspot.

Does that make sense?

 
Ann

Well, on https://docs.sonarqube.org/9.4/user-guide/rules/ it says:

For Vulnerabilities, the target is to have more than 80% of issues be true-positives. … It is expected that more than 80% of the [security hotspots] will be quickly resolved as “Reviewed” after review by a developer.

But from what you’re saying it sounds like the decision of classifying a rule as vulnerability or hotspot is based on a high-level analysis of the rule rather than any empirical data. Thanks for clarifying.

If you want a good example to cite here, use S2068 about hard-coded credentials. It basically just looks for strings like “PASSWORD” but those can be as harmless as “this is the string to use as the prompt.”

1 Like

BTW, the other thread I linked to (about another rule being deprecated) pointed to a Jira ticket ([SONARJAVA-4017] Reintroduce S1697 to detect specific cases of null pointer dereferences - SonarSource) that was started in response to my comments. The general comments in that ticket on rule deprecation probably apply to this rule as well.

I looked at the Jira record but don’t seem to be able to add comments there. So I’ll add my comments here – even though they more apply to the other rule (the one I linked to here), that thread is closed.

My two things I would add to that Jira thread are:

  1. The thread says “S2295 [sic; meant 2259] has FP although S1697 has none” and “a rule with a smaller scope fully included in the broad rule exists and has no FP.” Our experience has been that S2259 doesn’t just “have” FP, but in fact a large percentage of the issues, if not a majority, are FP. And I don’t know if S1697 in fact has NO FP at all, just has at most a small fraction. So I would prefer a criterion like “not many FP” instead. I wouldn’t want to lose S1697 even if it did occasionally produce an FP because the alternative is a rule that, as far as I can tell, is mostly FPs.

  2. It’s not clear to me whether the Jira ticket only means Java, or all languages. But the general principle raised there should apply to all languages. In particular, for S2259 it should apply to C# and any other language where the same type of construct is possible.

Here a small update from our side. Generally speaking, we share your opinion completely: More specific and accurate rules should not be replaced by more generic ones that report less accurately. On the contrary!

I am going to add the rules you have mentioned to a larger effort of reviewing our Java rules that we plan to start later this year. Then we can spend some time to investigate why these rules were merged originally and if we want to reconsider this decision. Thank you for the discussion and input you provided with this thread!

2 Likes

My guess (for why the rules were merged) was that someone complained about the same code being reported twice, or maybe one of your devs thought people might complain.

Indeed, I reported a case of one rule being a strict subset of another rule (Difference between java:S1158 and java:S2131?), where a violation of one rule was always flagged under the other rule as well (but not vice versa necessarily). But in that case it appears both rules were picking out something straightforward with few (if any) FP, so it was a simple case for us to turn off the narrower rule.

One solution would be some kind of accounting, like “don’t flag this under rule X (e.g. S2250) if it’s also flagged under rule Y (S1697)” which would depend on factors like whether both rules are on (so, if S1697 is deactivated then should a violation of that rule still be counted as S2259?). Probably would need to be done in the post-scanning analysis phase in the SQ server. And then maybe you’d need something in the rule description to mention these cases.