The recent application of Quality Gate metrics to new code on short-lived branches is excellent and is working perfectly for my team when violations are found in the bug, vulnerability, or code smell categories (i.e. ones that create explicit issues). However, despite coverage on new code being calculated correctly, it is not failing the Quality Gate I’ve setup for coverage percent on new code.
We’re using SonarQube Enterprise Edition 7.7 and I’m hitting this issue on Git-based Python projects with coverage reporting generated from pytest and security reporting generated from bandit (both reports ingest fine). I’d expect this issue to present itself with other languages.
Some branch information from a sandbox project where I’ve reproduced this:
- master: long-lived, 90% coverage, no security vulnerabilities
- high-code-coverage-with-sec-vuln: short-lived, 90% coverage (100% on new code), one security vulnerability
- lower-code-coverage-no-sec-vuln: short-lived, 78.6% coverage (50% on new code), no security vulnerability
I’ve setup my Quality Gate with the following metrics:
- Coverage < 60%
- Coverage on New Code < 80%
- Duplicated Lines (%) > 20%
- Duplicated Lines (%) on New Code > 3%
- Maintainability Rating worse than B
- Maintainability Rating on New Code worse than A
- Reliability Rating worse than B
- Reliability Rating on new Code worse than A
- Security Rating worse than B
- Security Rating on new Code worse than A
My understanding is that these metrics should allow us to have more lenient standards on our legacy code (so we can improve it over time), but to hold developers to higher standards for any new code they check in since the “… on New Code” metrics will only be exercised on the short-lived branches.
Unfortunately, only the ‘high-code-coverage-with-sec-vuln’ branch fails the Quality Gate (due to the security vulnerability that drops its rating to a B for new code). The ‘lower-code-coverage-no-sec-vuln’ branch should also fail the Quality Gate because the coverage for new code is only 50%, which is lower than the 80% metric.
This seems to be related to Log in - SonarSource and Code coverage on branches does not break quality gate but in this case the coverage is being reported accurately; the Quality Gate just isn’t picking up on the failure.
Am I doing something wrong or is SonarQube not correctly reviewing coverage results for short-lived branches? I imagine the same may be true for duplications, but I haven’t tested that yet. If what I’ve outlined here is kosher, my guess is that the Quality Gate for short-lived branches is still only reviewing open issues, and the lack of explicit issues for coverage (and potentially duplications) masks the issue.
In fact, conditions on coverage and duplication should be ignored if less than 20 lines were updated. This has been the case on long-living branches for a while (see SONAR-9352 and is also the case for short-lived branches and PRs.
There should be a message that pops up to mention this, but alas it was forgotten in implementation. A ticket has been created to remedy this.
Hi Colin, thanks so much for the reply. Your explanation was spot on; I wasn’t aware of the 20-line minimum. I’m glad to hear that the note will be displayed on short-lived branches soon.
After updating my ‘lower-code-coverage-no-sec-vuln’ branch to have more than 20 lines of new code it tripped the coverage metric in the quality gate and displayed as failed, as expected.
Happy to help! Thanks for calling it out.
Out of curiosity, and not to say that this point is currently up for reevaluation — does this make sense to you? Putting aside the presence of a warning in the UI, do you feel it’s appropriate your QG still passed?
This threshold has always made sense to us for long-lived branches to avoid failing on bug fixes / the beginning of sprints, but since real QGs for PRs were just introduced in v7.7, I wonder if you have any feedback on this.
Definitely happy to share some feedback.
In general I would be surprised if a feature branch ever amounted to less than 20 lines of changed code, so I think the threshold would work fine there most of the time. There are a couple corner cases that may present themselves regularly, though:
I could imagine a feature branch where a developer is implementing what was previously a stubbed function, and the implemented function could result in less than 20 lines of code. In that case a quality gate measuring coverage on new code would not verify whether the developer implemented corresponding test coverage, so the untested function could be missed and get pushed upstream.
Similarly, I could imagine a bug fix branch where a developer adds an else or an else-if to some existing logic after realizing they weren’t covering some condition. In that case the new conditional could easily amount to less than 20 new lines of code, and again if they didn’t implement corresponding test coverage it would be missed by the quality gate for new code and only flagged once it got pushed upstream (if the upstream long-lived branch dipped below its threshold for coverage on non-new-code, i.e. whole project coverage).
So, I’m not sure that the 20-line minimum for reviewing coverage metrics on new code is actually a good thing. I can’t come up with any examples where I’d feel that SonarQube was doing the wrong thing if it flagged 15 lines of new code as not covered by tests. It’s just reporting the truth; I didn’t cover some percentage of my new code with tests. I would be in favor of eliminating the 20-line minimum for coverage analysis on new code, but I could definitely be overlooking something.
On the duplications side, looking at the docs here https://docs.sonarqube.org/display/SONARQUBE45/Duplications says that by default a duplication is 10 lines of code appearing more than once. That would necessitate 20 lines of new code in order to trip a duplication check, but assuming the minimumLines parameter can still be tweaked, a team may want to lower that threshold to 5 lines of repeated code, in which case you’d only need 10 lines to have a duplication and that would fly under the radar of the current 20-line minimum. I think it would make more sense to tie the number of changed lines for assessing duplications in a quality gate to the minimumLines parameter; however at that point you can probably just remove the 20-line minimum here as well because a duplication will never be flagged unless there are enough new lines of code to allow for one (i.e. only look for duplications based on the value for the minimumLines parameter, not based on the parameter and the 20-line minimum).
So, talking through possible scenarios for both coverage and duplication analysis on new code, I think it probably makes sense to eliminate the minimum line count requirement. With regard to the 20-line minimum being useful on long-lived branches for things like bug fixes, wouldn’t you still want to know when your metric for code coverage isn’t being met? I’d be interested to hear if there are particular situations where removing it would result in quality gates doing something wrong, or even just generally being a burden.
Chiming in to ask a question:
If https://jira.sonarsource.com/browse/SONAR-10485 was implemented for all types of branching, would that not solve the question of whether 20 lines is appropriate for a short lived branch by allowing users to set whatever threshold is appropriate to them?
I’m 100% in favor of allowing customization via parameters. Allowing projects to set something like sonar.qualityGate.coverageMinimumNewLines and sonar.qualityGate.duplicationMinimumNewLines would entirely resolve my concerns outlined above (we would just set them to zero). Something like sonar.qualityGate.minimumNewLines could also be used to conditionally disable the quality gate entirely when a branch has less than the specified minimum number of lines of new code, but I can’t think of an example where I would actually want to do that.
It should be configurable and documented better in the coverage and duplication part of the documentation.