SonarQube analysis accuracy master vs. branch vs. PR with sonar.inclusions

Hello there,

We’ve been pondering a lot whether to post the issues we’ve been facing with SonarQube Developer Edition ever since we acquired the license half an year ago. So far we tried our best to find a setup which suits our needs but we are currently facing another challenge for which we would highly appreciate your input.

As I said, we are currently using SonarQube Developer Edition v. Till mid 2020 we were using the Community Edition and we obviously decided to upgrade to be able to also run analyses on branches, to avoid merging issues into the master branch. Before upgrading we did a short research with a trial license and unfortunately concluded that branch analysis is incremental and will run “fast enough” on our 3M lines codebase. Once we acquired the license we realized that was not the case and attempted to find a solution for this. The average time for running a single analysis on our monolith application is 1.5 hours.

That said, we are currently running two types of analyses:

  • full analysis on master and feature branches (using the parameter). Each analysis takes around 1.5 hours (as stated above).
  • “fast” analysis for PRs on feature and bugfix branches (using sonar.pullrequest.branch, sonar.pullrequest.base and sonar.pullrequest.key parameters). By “fast” we mean we are only analyzing the modified files in a PR (also using the sonar.inclusions parameter). We found in several other topics on this forum that you do not recommend using it because it may result in inaccurate data but, for us, this felt like a good compromise given that there are situations in which we cannot wait for a full analysis for bugfixes branches which need to be delivered as quickly as possible. Each fast analysis takes between 5 and 10 minutes, depending on the number of modified files.

However, we are getting quite inaccurate results for full analyses and we have no idea how to deal with it. The most reliable analysis seems to be the PR one (using sonar.inclusions) probably because it’s running on a smaller number of lines. Let me give you some examples:

  1. A line of code which contains a legit issue is not found in the full master analysis. However, it is found in the “fast” PR analysis and thus it appears as a “new” issue even though it’s actually legacy.

  2. Issues found on a feature branch relative to master are different than the ones found on the PR for the same branch which is due to be merged into master. Issues are random in both cases (part are legacy which should have been found on master but weren’t while the rest are NEW and legit but not always the same in both places)

  3. Even if the branch’s New Code setting is “Relative to branch”, we noticed that the branch analysis is sometimes “reset” - right under the New Code tab it says “Compared to master, Started x hours ago”. But we know for sure that this branch and analysis were created long before those x hours. Because of this, “New Code” issues found on the feature branch are less than the ones found in the PR analysis. However, those issues are found in the branch analysis as well but because they are older than those x hours, they are counted as Overall Code instead. Do you have any idea what could be resetting this New Code period for branch analysis?

We appreciate any input from your side to try to improve our results accuracy. We can imagine that on such a big codebase the results can’t be always precise but right now they are very confusing for developers and no as helpful as we imagined they would be.

Thank you.

Hi Alexandra, welcome to the SonarSource Community!

One of the reasons we wouldn’t suggest using different inclusion/exclusion settings between branches of the same project is that it can lead to exactly the kind of confusion you’re suffering from. There isn’t a consistent sense of what’s “in” the project in the first place, so on top of that separating out what is new vs. legacy is quite a challenge.

The best way I can summarize our overall guidance, from the beginning, would be like this:

  • Scan master/main branch first, with no parameter, narrowing the focus as much as possible to get scans to an acceptable level of performance
  • Always scan other branches with the same general sense of sonar.source and inclusion/exclusion properties set
  • Always scan parent branches before scanning any child branches (meaning branches that are intended to be merged into the parent)

Regarding the reference branch confusion, you’ll want to make sure the branch being analyzed is what is truly checked out. Sometimes shallow or incorrect checkouts can cause issues for our analyzer determine the state between the branches.

I hope this helps clarify!