Setup: pull request scan. Self hosted Github runner.
Workflow snippet:

      - name: SonarQube Scan
        if: env.RUN_SonarQube == 'true'
        uses: sonarsource/sonarqube-scan-action@master
          SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
          SONAR_HOST_URL: ${{ vars.SONAR_HOST }}
          SONAR_SCANNER_OPTS: -Xmx6500m
          projectBaseDir: core
          args: >

Branch setup:

  • develop (main/default)
  • master
  • release_1
  • release_2

Server-side caching is enabled. The Java analyzer was able to leverage cached data from previous analyses for 13459 out of 17735 files. These files will not be parsed.

=> The PR has only a few changed files. This case looks incorrect to me. When there is a low load, it seems to be fine. Under high load, it looks like this, leading to higher scanning times. Is there a way to clear the cache or make sure the correct one is used?


I believe this is about the web of other classes that use/are used by the changed file. Do you have a high degree of coupling?


Hi Ann,

Im not a developer, but I get what you are hinting at. In the case you are referring to, I can understand and agree.
But I also see plenty of cases like this were a single XML file was changed with the same result. In this case there are no classes or coupling.


Are we having the same discussion in two separate threads?


Hi Ann,
I created 2 topics to avoid this, but it seems there is some overlap to the issues im investigating.
For the topic you mentioned I want to know how ignored files are handled by the scanner. I have an answer for that question in that topic.

For this topic I would like to keep the focus on the diffs detected.

1st question:
Could you explain how this mechanism works in the sonar scanner?

Server-side caching is enabled. The Java analyzer was able to leverage cached data from previous analyses for 13459 out of 17735 files. These files will not be parsed.

The way I understand it:

  • Cache is downloaded
  • Changed files are checked and referenced files/classes are included in the scope
  • The delta between files in cache and files changed + referenced files/classes are excluded from the analyzers

Do I understand this correctly?

2nd question:
How can a single .xml file change in a PR trigger a scope 4.000 files to be scanned by the analyzers?

Based on my understanding, something is wrong. My suspicion is an incorrect cache.
3th question:
How can I recreate the cache? How can I delete the cache? Where is the cache coming from? Downloaded from sonarQube server? Is there a cache per branch?


What is the file? If it’s a POM, I’m not surprised the analyzer saw a lot changed.

For the rest, I’m going to flag this for more expert eyes.


Hey @Dennis_DECA,
Thanks for your post. That number of files that need to be re-analyzed from scratch looks pretty high.

I think Ann hinted at the right point to start the investigation: if a file that is depended upon by a lot of other files changes, it might have a cascading effect.

1st question:
Could you explain how this mechanism works in the sonar scanner?

Sure, as we analyze the base branch of the project, we build a cache that can then be leveraged by PRs that branch off from it.
The number of files for which we manage to leverage the cache depends on the detection of changes:

  • sources (will change in your versioning and potential CI build actions)
  • class files (will change with your build system and compiler)
  • dependencies between files (will change based on your code’s logic)

We tend to stay on the side of safety meaning that whenever there is a doubt about whether a file has been changed, we analyze it from scratch and we do not leverage the entry in the cache.

2nd question:
How can a single .xml file change in a PR trigger a scope 4.000 files to be scanned by the analyzers?

Consider the impact that changing a plugin within your pom.xml or even explicitly setting a property might have on the sources (eg: enforced formatting) and binaries (eg: obfuscation) at the end of the build.
Again, we try to stay on the safe side to serve reports that are as fresh as possible

3th question:
How can I recreate the cache? How can I delete the cache? Where is the cache coming from? Downloaded from sonarQube server? Is there a cache per branch?

The cache is coming from your SonarQube instance. You should be able to recreate it by re-analyzing your base branch in build conditions that are similar enough (ie: a similar pom).
As for dropping the cache from your SonarQube instance entirely, I am not sure that this is possible without playing around with the database (and that is where we are getting out of my comfort zone :wink: ).

Give the re-analyzing the base branch a try and let us know if that works for you



Hi @Dorian_Burihabwa ,
Thank you for the extra insights. I do agree messing directly with the DB is a no no.

We do a full scan of the main branch every night, so it should be up to date. Would it be better to perform a scan on every push to the main branch?

Our branching model:

  • develop (main/default)
  • master
  • release_1

As for the XML. It was not a pom.xml. It was a datasource xml. Each file contains a standalone SQL, not connected or referencing anything else in the code.
Would it be possible to list all impacted files (not used from cache) when running in debug mode? This would make it easier to investigate.

Lastly, the new code page has no impact on this issue?

Hi @Dennis_DECA,

Sorry for the late reply.

Would it be better to perform a scan on every push to the main branch?

Running a regular analysis of your main branch would be good in general. Ideally, it would be re-analyzed every time something is merged, but a periodic cron job would already be a good start.

Lastly, the new code page has no impact on this issue?

It should not. You should be good to go if you have analyzed your reference branch at least once with this SonarQube version.

Would it be possible to list all impacted files (not used from cache) when running in debug mode?

There is a way you can do this, but you will have to log at a lower level than DEBUG; you will have to log at TRACE level. Beware, this is going to get really verbose, especially if your project is as large as described above.

For a Maven project, you will need to append both these options

  1. -Dsonar.log.level= TRACE
  2. -Dorg.slf4j.simpleLogger.defaultLogLevel=TRACE

For java files, we should be able to see cache misses by looking up the following lines

Could not find key <file key> in the cache

To clarify your project type, does it contain one module with over 17,000 files? That seems like a lot, and I am not sure we explicitly tested the cache against such a large module.

Let us know what you find in your exploration!


Hi @Dorian_Burihabwa ,
Thanks for your reply. I checked some things, but I couldn’t get it to work. I set the properties in the sonar scanner that we run via a github action. However I could never find any trace information in the log file. The Log file from the scanner.

      - name: SonarQube Scan
        if: env.RUN_SonarQube == 'true'
        uses: sonarsource/sonarqube-scan-action@master
          SONAR_TOKEN: ${{ secrets.SONAR_TOKEN_XXX }}
          SONAR_HOST_URL: ${{ vars.SONAR_HOST_URL_XXX }}
          SONAR_SCANNER_OPTS: -Xmx6500m
          projectBaseDir: core
          args: >

I also checked with our developers: 17.000 files in a module seems normal. Currently there are no plans to split this in smaller modules.