'New Code' on feature branch being polluted with details from mainline

Must-share information (formatted with Markdown):

  • which versions are you using (SonarQube, Scanner, Plugin, and any relevant extension)
    Sonarqube Developer Edition Version 8.4.1 (build 35646)

  • what are you trying to achieve
    Apply Quality Gate strictly to code committed on the branch

  • what have you tried so far to achieve this
    Various configurations within Sonarqube

In more depth. Our project has a quality gate that specifies coverage of 80%. Let’s hypothetically say that on Monday, we cut a new version. On Tuesday, we commit 100 lines of code to mainline, with 90 of them covered - at this point, the gateway on the mainline build is happy (90/100 > 80%). On Weds, Alice, a developer creates a feature branch off of mainline, and commits 20 lines of code to the branch, 18 of which are covered - the branch build is also happy (18/20 > 80%).

On Thursday, Bruce, another developer, adds 50 lines of code to mainline, 30 are covered. The mainline build is still happy - ((90+30)/(100+50) = 120/150 >= 80%).

On Fri, Alice decides to merge mainline into her branch. She expects that the branch will still be happy as from her perspective, her branch just has “20 lines of new code, 18 of which are covered”. What appears to be happening though is that Sonarqube is aggregating both Alice and Bruce’s code as “New Code”, and yielding a ((30+18)/(50+20) = 48/70 < 80%) failed gateway result. Alice does not want to be adding coverage for Bruce’s code on her branch. Bruce feels as if he’s done as his code made it to mainline OK. The ‘New Code’ coverage on the branch feels wrong.

Originally, we had ‘New Code’ period on the branch configured as ‘Previous Version’, but after reading more in depth, the above calculation makes sense with that setting.

When we changed the New Code period on the branch to ‘Reference Branch’ (mainline), we inferred that the calculation ought to now be based on the code difference between mainline and branch (i.e. 18/20), but we are not getting that result.

No matter what we do, the branch calculation is polluted by subsequent work from mainline (once merged in). Alice just wants to merge her work and feels her coverage is adequate.

1 Like

Hi Jeff,

What you expect from the analysis of the branch with ‘Reference branch’ set to mainline is how it should be working.
If I understood correctly, the second commit in mainline (50 lines changes) is showing up in the branch as new code, after the branch was rebased on mainline?

Are you using a CI service to analyze the code?
SonarQube detects what is the “new code” compared to the ‘reference branch’ by using the SCM (git in most cases) when the code is analyzed. If the clone that the scanner is using doesn’t have an up to date reference of ‘mainline’, it could be detecting those 50 lines as new in the branch since they aren’t in ‘mainline’.

Hopefully I’m not hijacking this thread, but we’re seeing a similar problem.

We just upgraded from Sonar Developer 7.9 to 8.3.1.

On 7.9, when the CI system would do a PR build, the only changes flagged by Sonar would be the specific changes included in the PR. e.g. If I only changed file A, Sonar would never report pre-existing problems in file B as part of my PR build.

Since the upgrade to 8.3.1, the CI invoked PR builds are randomly flagging issues from files not touched in that PR. e.g. If I only changed file A, Sonar is now also randomly reporting pre-existing problems in file B (or C, or whatever) as part of my PR build. Due to our use of merge gating, this has caused a significant impact to our workflow.

I need to know whether the problem we’re facing is coming from:

  1. A required configuration change moving from 7.9 to 8.3. If so, what is the required update?
  2. Database corruption caused by the migration
  3. 8.3 handles PR branch builds differently and in a way that effectively breaks our workflow.

Thanks

Hello,
Is the code highlighted as “new” in the pull request correct, or are files and lines of code not modified in the pull request showing up as modified in the P/R?
I suggest you enable debug logs in the scanner and check the logs for the builds that result in unexpected issues. Assuming you’re using git, the logs will indicate what was the commit used as the merge base between the checked out commit and the target of the pull request.
Most problems with P/Rs come down to the scanner failing to detect the correct ‘diff’ between the code being analyzed and the target of the pull request. If you post the logs or send them to me privately I can also have a look at it.
Pull request analysis didn’t change significantly since 7.9. The only difference I can think of is related to complex cases where you have a multi level hierarchy of branches based on other branches.

We use Bitbucket (git) as our SCM. When I view the PR in Bitbucket UI, it correctly identifies new code on the branch, showing me the expected diff (no pollution).

In Sonarqube, the pollution is present in ‘new code’. I’m not able to view a ‘PR’ view of things in Sonarqube, so other than it showing up in the metrics, I’m unsure what’s going on.

We are not using ‘rebase’ (git-rebase) for our branches. A branch is created. Work is done, committed and pushed to the branch. At some later point, the mainline is merged (git-merge) into the branch, generally creating a new commit on the branch.

I have experimented with rebase (git-rebase) as a possible solution to this. However, I was reluctant to do a force-push for it. Consequently, following the rebase, I was required to do another pull, which results in duplicated commits. In the rebase-with-duplicate-commits world, Sonarqube still sees the mainline work as ‘new code’ on the branch.

Jeff,

Got it. There is no need to do a rebase. As I mentioned, the scanner relies on git to identify changed code compared to the target branch.

It first tries to find the merge base commit, which is the common parent between the branch and ‘mainline’. The same operation as doing git merge-base HEAD mainline.

With the debug logs enabled, it should print the commit that was found. For example:

Computing New Code since fork with ‘master’
[…]
Merge base sha1: 111ef31292d24acdfdb5b18c4af6ec83279d5629

Could you check if the commit found is correct by comparing with what is in Bitbucket UI?

There has been many problems cause by local references to branches in git clones being incorrect or out of date, leading to wrong diff being found by the scanner.
Are you using a CI service to run the scanner? In any case, make sure all references are up to date. Most CIs will perform a fresh clone but if that’s not the case for you, do git fetch --all before running the scanner.

I’m actually wondering now how the git log looks like after you do the merge into the branch. Perhaps finding the merge base fails when there are 2 ancestors.

We run the scanner locally as part of our build. We have a Jenkinsfile (declarative pipeline). In one of the stages, we perform a “withSonarQubeEnv” that executes 'sh “mvn -T 4 -Dsonar.projectVersion={placeholderVersion} -Dsonar.branch.name={branch} org.sonarsource.scanner.maven:sonar-maven-plugin:sonar”.

Based on reading other posts, other similarly-affected people have traced their issue to “shallow-clone” issues. I don’t think that is affecting us. As part of an earlier step in the above process, we perform a “bbs_checkout” (the command Bitbucket recommends for retrieving the source code). From inspection, this appears to have a full copy of the source (not shallow), although it is a ‘detached HEAD’.

I am unsure about the results of the above merge-base command. We’ve subsequently disabled the gateway check in order to merge the PR, so my results are somewhat questionable. I don’t currently have a broken branch to look at.

  • When I go to the (old) build-environment for the broken branch and run the command, it yields the common ancestor as a 7-day old commit. The expected ancestor is 4-days old. I’ll have to dig to see why that 7-day ancestor is coming into play.
  • When I look at other (seemingly OK) branches, the above command yields the expected commit as the fork point.

I’m going to try and recreate the issue and then use the above command to see if it helps me understand it better.

I will also research why that 7-day commit (vs 4-day commit) is being returned.

OK, figured out why the (old) build environment is giving me squishy results. I performed a rebase in it, so I’m seeing commits with old-author-dates and new-commit-dates. That rebase has messed up the data, but we did the rebase in an attempt to work around this issue, so I cannot say the rebase caused it. What I can say is that I don’t have an environment that I can do research in. Give it a week or so, I’ll add more details as they crop up

I think I’ve discovered the work-around(?)/fix(?). I’ll document it in case it helps someone later.

It appears we get the correct results if we use the “PR-style” scanning, as opposed to the “branch-style” scan previously indicated. That is, there is a difference between:

Originally, we were doing this (and pollution can apparently occur):
mvn -Dsonar.projectVersion={placeholderVersion} -Dsonar.branch.name={branch} org.sonarsource.scanner.maven:sonar-maven-plugin:sonar"

We’ve switched to this with seemingly better results:
mvn -Dsonar.projectVersion={placeholderVersion} -Dsonar.pullrequest.key={pullRequestId} -Dsonar.pullrequest.branch=${branch} -Dsonar.pullrequest.base=mainline org.sonarsource.scanner.maven:sonar-maven-plugin:sonar"

This latter approach sends the results for PR decoration (there is some extra configuration required to make decoration work, detailed in the docs). Additionally, we had to jump through a hoop or two to get the pullRequestId for the branch as part of our build process. But the payoff seems to be worth it. When Sonar is tied to the PR, it seems to have greater clarity on what represents new-code.

Thanks for the update. I’d still like to know what went wrong with the “reference branch”. The steps taken to find the changed code are very similar to what is done with P/Rs.

Looks like I claimed success too soon. Issue has popped back up, even when using a “PR style” scan. But… a chance to get more info about what’s going on.

The blue line represents work on the branch (going back a few days, aka 'feature/OG-7439"), whereas the yellow line is mainline (aka ‘develop’ in our environment). As you can see, I merged mainline LATEST (7e11a90) into branch creating merge-commit 4e8a0e3 (with some subsequent work on the branch as well). The branch built at db8c (the tip) and failed at the sonar gate due to pollution from mainline

On the branch, if I run ‘git merge HEAD origin/develop’ it returns 7e11a905 (the tip of develop). If I understood you correctly, it ought to be determining a much older commit as the original fork-point for the branch. Logically, 7e11 is the ‘closest’ commit that has history from the branch, although I do not know it that is relevant.

As previously noted, looking at the PR in bitbucket correctly shows files on the branch that changed on the branch.

Ran an additional experiment, and may have found a partial work-around.

With the broken branch (above), I performed a rebase from mainline (i.e. rebased on top of origin/develop). As expected, this rewrote history in the usual way and the ‘git merge’ command indicated 7e11 (on my development machine) as the fork point (as expected). The build (on the build box) from the rebased branch initially failed with gateway pollution - which obviously was now very unlikely given the rebase. Digging deeper, I went to the build-box, reran the ‘git merge’ command, got a totally different (older) commit as the ancestor, performed a ‘git fetch’ (on the build box) manually, and reran the build. This time, it worked - no pollution. Due to the rebase, I’ve now lost my environment for further testing.

I’m not sure if it was the rebase that helped resolve it, or the cleansing of the Jenkins workspace - combination of the two worked for me. It could be both, but I’m increasingly suspicious that the Jenkins workspace is somehow contributing. The solution could be as easy as 'when pollution occurs, delete the Jenkins workspace and rerun the build (because Jenkins is a bit loose about the git state, especially when you do repeated builds). I’ll try this next time.

Could you post the scanner logs with debug enabled when the ‘pollution’ of the branch happens?

Apologies on the size. Hopefully it has something useful. I am trying to get a smaller example

scanner.txt (2.4 MB)

Thanks. Looks like for that P/R the scanner found the target branch and was able to compute the files/lines that were changed compared to the target branch. Are the results for that P/R not what you expected? Unfortunately debug logging wasn’t enabled so it’s not showing the commit sha1 of the merge base between the P/R’s HEAD and the target branch.

What is the mechanism for enabling debug when running via maven?

It looks like ‘sonar-scanner’ wants -X or --debug, but those would get picked up by maven proper. The ‘SonarScanner for Maven’ page in the SonarQube docs is silent on debug.

Yes, -X is the way to do it with maven, even though you get debug logs from all other goals as well.

Problem popped up again allowing me to refine the work-around. Good News. I think the problem is resolved via forcing a ‘git fetch’ on the branch when Jenkins builds it.

What worked this time:

  • Problem occurred. A build died at the sonarqube quality gate.
  • I manually logged into build box, went to workspace for the affected build.
  • ran the ‘git merge …’ command - confirmed it indicated an old commitid
  • ran a ‘git fetch’ - saw a bunch of stuff come over
  • ran the ‘git merge …’ command - confirmed the commitid changed
  • restarted the build in Jenkins - this time it passed.
  • List item
  • Bottom line: the ‘git fetch’ resolves the issue (and I’ll just bake it into our build process)

The longer story, now that I’ve looked at it and my forehead is properly flattened…

  • We use Jenkins for our builds (specifically, multi-branch declarative pipeline). When the branch is first created (e.g. in Bitbucket), Jenkins spots it and clones it into a Jenkins workspace and does the first build. No real magic there.
  • As additional commits occur on the branch, webhooks (from Bitbucket to Jenkins) notify Jenkins that a new build needs to occur. I believe as part of this, Jenkins identifies the commitid of the new commit.
  • Jenkins attempts to be cleaver. It knows it already has a workspace (original clone), and it knows the commitid of the new commit that it needs to build. Apparently, it ‘cheats’ and simply drops the new code into the existing workspace but does not cause the “.git” information to change (i.e. it’s not fetching or pulling, it’s more of a raw update). The build is happy because all the files are precisely what they should be - they match the new commitId.
  • But sonar gets handed this lobotomized world. Sonar looks at the ‘.git’ info (which is not updated), calculates the ‘git merge…’ based on stale information, and goes sideways. Adding the ‘git fetch’ to the build (prior to sonar step) insures the ‘.git’ info is properly up to date.

I may not have this perfect, but should be close-enough. The ‘fetch’ is definitely the simple workaround.


Here’s the full details - affected branch is the OG-7522 one
Branch State:

OG-7522: Branch was originally created from 205cfcc. Here’s relevant piece of “Build #1” where it clones. This build passed and was working OK (no changes on the branch yet)

Later (Build #2), we built at commit 2577869. This build failed at the quality gate check. Here’s what the git-retrieval looks like:

Following the failure, I went into the build box, did the ‘git fetch’ (above). Prior to the fetch, the ‘git merge’ command shows 205cf. After the fetch, the ‘git merge’ shows ‘25778’. I then launched a build on Jenkins (Build #3) (also at commit 2577869). The git-retrieval is now:

All of this ‘git retrieval’ (entirety of the log) is coming from the ‘bbs_checkout’ plugin (for pipelines). We are not doing the git commands directly ourselves. I think the ‘bbs_checkout’ is largely identical to the Jenkins ‘git’ plugin (for pipelines)

All that said, I’m not sure what sonarqube (the company/tool) can do about it. Seems like the problem is some shortcuts(?) that jenkins and/or bitbucket has taken. Fix is easy - do a fetch before running sonarqube. (which, at best, is something that could be mentioned in the sonarqube documentation)

Feel free to resolve this ticket as closed :slight_smile:

1 Like

Thanks for the update.
I agree, unfortunately there isn’t much SonarQube can do, in its current architecture, if a reference to the target branch is not available in the clone for the scanner to read.