Empty short-lived branch analysis


We are trying to get the branch analysis working for our python projects going, but so far we only managed to do that for main branch analysis. Short lived analysis appears to be successful, but in the end analysis itself is basically empty - even if we change the analyzed code or try to introduce violations to test it. We use almost identical command and environment for branch analysis (with the addition of sonar.branch.target, of course).

We want to see the the analysis of the branch code - any potential new violations and code coverage regressions.

We are using:

Hi @zpapierski_wmf ,

Welcome to the community!

t looks like no changes have been detected therefore there are no files analyzed (that is why there is an empty code tab).

Can you add this property : -Dsonar.verbose=true to get more detailed logs and share it afterwards?


The change behind this build is https://gerrit.wikimedia.org/r/c/wikimedia/discovery/analytics/+/672730/1. It does introduce a change, but probably no new violations. My expectation is that in this case at least the code part of the sonar analysis should show the new code (https://sonarcloud.io/code?branch=672730-1&id=discovery-analytics), or is code only shown if new violations are found?

I’ll add the verbose option and report back.

The build log of the branch and its sonar analysis is available: wikimedia-discovery-analytics-patch-tox-docker-with-sonar #22 Console [Jenkins]

A few other things that might be relevant:

  • We do a shallow git clone with --depth 2
  • Our git backend is Gerrit (https://www.gerritcodereview.com/) which has a few quirks in how it manages changes. I’m not expecting anything special here (in the end this is still git), but who knows.
  • We use Zuul (https://zuul-ci.org/) to manage our CI, which means that changes are merged on top of master before being built. Again, I don’t expect this to matter, this is still git and should just make sense from sonar-scanner point of view.

Hi @gehel,

My expectation is that in this case at least the code part of the sonar analysis should show the new code or is code only shown if new violations are found?

The new code will be shown on the code tab, even though there are no violations.

In your case no new code was detected:

21:11:36.889 DEBUG: SCM reported changed lines for 0 files in the branch

Can you tell me if my understanding of your process is correct?

  • You checkout master
  • You merge the change to master
  • You execute the sonar-scanner

If this is correct, when you execute the sonar-scanner, the target branch that you set to master is equal to the one you are executing analysis on (in other words you are on the same branch as your target branch), therefore no changes are detected.


Can you tell me if my understanding of your process is correct?

  • You checkout master
  • You merge the change to master
  • You execute the sonar-scanner

More or less and yes, that’s probably the issue. I was confused as our PHP projects are analyzed without issue. But it seems they are checked out differently.

I’ll dig a bit more into this on our side, but I think we have what we need to resolve the issue.

I’m assuming that the sonar analyzer is getting the diff from something similar to git diff HEAD..HEAD{u}. Could you confirm? Or point me to the right place in the code?

First the base merge SHA is detected based on target branch. In your case:

21:11:36.888 DEBUG: Merge base sha1: d3cec2812ed500fab6f27d18599b5e83db60424a

This is a commit that your branch will be compared against. Then based on this base merge SHA we get the old tree and new tree based on the last commit. Then we do a comparison using sth similar to git diff.

For Example (target branch is master):

commit ac78d6503c1dfd0671cb3e7b0c787cf167a12aa6 (HEAD → short-branch)
Date: Thu Mar 18 15:04:41 2021 +0100

Yet another commit

commit 19e8f72063d4c71bccaa88391c8efab0af220fb8
Date: Thu Mar 18 15:00:08 2021 +0100


commit 0435644e85ae976a81698d8a77564644d24ffa55 (origin/master, origin/HEAD, master)
Date: Fri Nov 20 14:38:42 2020 +0100

Another commit

In this scenario while analysing the short branch, my merge base sha1 would be 0435644e85ae976a81698d8a77564644d24ffa55.

I think you could solve this problem by creating a branch out of master, committing the change to this branch, run analysis on this branch and then merge to master in your pipe (if you want to have separate branch for every change you do on the SonarCloud). I do not see a way to achieve it while being on the same branch (although I do not understand all aspects of your system, so I may be mistaken here).


How does sonar identifies the commits on which to find a merge-base? We’re probably messing with its heuristic with our build strategy. We can tune what we provide in our git clone, but without knowing what is exactly needed it is not trivial.

As an example, our current setup produces this (git decorate --graph --oneline -n 30) and I would expect the analysis to be done between d3cec28 and 4300929 (which is origin/master) :

* d3cec28 (HEAD -> master, refs/changes/30/672730/1) Hive sensor naming fix in export queries to relforge
* 4300929 (origin/master) convert_to_esbulk: Accept partial hour timestamps
* 82e0654 prepare_rev_score: Rename scores_export to bulk_ingest
* 05e42b0 airflow tox: Require sqlalchemy < 1.4.0
* 3bc7024 Add code coverage to Sonarqube report
* 73e9ded Typo fix for sonar properties
* f65e5c9 Deploy the index setting file
*   64f24a5 Merge "Add sonar scanner to discolytics"
| * a3be91d Add sonar scanner to discolytics
* | 9a408b2 Add elastic-template handling
* | 3810277 Set correct start date for export_queries_to_relforge
* |   cc478d4 Merge "Add export_queries_to_relforge dag"
|\ \  
| * | e36cd6a Add export_queries_to_relforge dag
| |/  
* | 4cc913e Correct refinery-drop-older-than checksum
* |   e47f735 Merge "Fix search satisfaction loading into druid"
|\ \  
| * | 200f5d6 Fix search satisfaction loading into druid
* | | 7f37d40 Replace refinery-drop-hive-partitions with refinery-drop-older-then
* | |   869a29b Merge "ores_bulk_ingest: Increase drafttopic error_threshold to 1 per 500"
|\ \ \  
| |/ /  
| * | dbcf737 ores_bulk_ingest: Increase drafttopic error_threshold to 1 per 500
* | |   61e7533 Merge "ores_bulk_ingest: Handle unexpected api response"
|\ \ \  
| |/ /  
| * | ed36793 ores_bulk_ingest: Handle unexpected api response
* | | ca2c5b5 fix import commons ttl dag
|/ /  
* |   44fba51 Merge "airflow dag for commons dump"
|\ \  
| |/  
| * 4034934 airflow dag for commons dump
* | 25549e7 ores_bulk_ingest: Add backoff to retries
* | 4ee50e3 Add tests for ores_bulk_ingest.py
* | 1344853 Spark env_vars should be applied to the executors too
* | 46a8ae1 ores_bulk_ingest: namespace argument is not plural
* | 3969cae Add manually triggered dag for ores bulk exports
* | c2190da Add environment for ores_bulk_ingest

Looking at the code, it seems that the magic is happening around sonarqube/GitScmProvider.java at master · SonarSource/sonarqube · GitHub

The target was set to master, so it found the latest commit that is present in the target branch and the branch you are analyzing which is d3cec28 (not the commit present on origin/master). You basically compared master to master, that is why no change is detected.

As you see in the code the multiple ‘refs’ are tried. The local ref has the highest priority, then the origin ref is tried etc. That is why it first tries to get the local master, not the origin one.

In other words, if I have situation like this:

0435644e85ae976a81698d8a77564644d24ffa55 refs/heads/master
0435644e85ae976a81698d8a77564644d24ffa55 refs/remotes/origin/master
14e82417acc4eb435d1d713142fb699c317e539b refs/remotes/upstream/master

And I set the target to master. First it looks for refs/heads/master if it does not exist it looks for refs/remotes/origin/master etc. Then based on this reference the base merge sha1 will be determined.