SonarQube Client-Side Cache: Persistent "FILE_CHANGED" on GitLab K8s

Hello SonarQube Community,

We are facing a persistent and significant performance issue with our SonarQube sonarscanner CLI analyses in our GitLab CI/CD pipelines, where the client-side analysis cache is consistently not being utilized. We’ve performed extensive troubleshooting and have a paradox that we hope the community can help resolve.


1. Versions Being Used

  • SonarQube Server: Enterprise Edition 2025.1.2 (build 108896)
  • SonarScanner CLI: 7.1.0.4889
  • Scanner’s Java Runtime: OpenJDK 17.0.13 (Eclipse Adoptium 64-bit)
  • Relevant SonarQube Plugins:
    • JavaScript/TypeScript/CSS Code Quality and Security: 10.21.1.30825
    • Java Code Quality and Security: 8.9.2.39294
    • JaCoCo: 1.3.0.1538
    • Dataflow Bug Detection: 1.36.1.13250
    • Vulnerability Analysis: 10.11.1.35426
    • Clean as You Code: 2.4.0.2018
    • IaC Code Quality and Security: 1.41.1.14587

2. How SonarQube is Deployed

  • SonarQube Server: Deployed on AWS EKS as a StatefulSet, using an RDS PostgreSQL 13.20 database. Deployment is managed via Helm and ArgoCD.
  • SonarScanner CLI: Executed within a custom Docker image where we download and setup sonar-scanner-cli-7.1.0.4889-linux-x64.zip and execute it on GitLab CI Kubernetes Executors.

3. What We Are Trying to Achieve

Our primary objective is to speed up SonarQube analysis for Merge Request (MR) pipelines by making the client-side analysis cache effective. Currently, analyses are consistently taking ~40-45 minutes.


4. What We Have Tried So Far to Achieve This

We have performed extensive debugging and implemented several optimizations:

  • Initial Setup & Performance Baseline:
    • Observed ~40-45 min analysis time on new SonarQube (2025.1 LTA) for PRs.
    • Compared to old SonarQube (9.9.1) where analysis took 18 mins but had widespread Java parsing errors for modern syntax (e.g., switch expressions). The new setup successfully parses this code, indicating a more thorough analysis.
  • Scanner/Java Environment Optimization:
    • Upgraded sonar-scanner-cli to the latest 7.1.0.4889 version.
    • Ensured scanner’s Java runtime is explicitly set to Java 17.0.13.
    • Increased scanner JVM memory to -Xmx12G.
    • Confirmed these changes in scanner logs: INFO SonarScanner CLI 7.1.0.4889 INFO Java 17.0.13 Eclipse Adoptium (64-bit) DEBUG Scanner max available memory: 12 GB
  • GitLab CI Cache Configuration:
    • Configured GitLab CI cache for the SonarScanner job with a project-level key:
  cache:
    key: "${CI_PROJECT_PATH}-${CI_JOB_NAME}"
    paths:
      - .sonar/cache
    policy: pull-push
  • Confirmed GitLab CI is restoring the cache artifact before the job runs.
  • Confirmed User cache: <project-path>.sonar/cache is the scanner’s correct cache path.
  • SonarQube Analysis Configuration:
    • Enabled client-side analysis cache: -Dsonar.analysisCache.enabled=true.
    • Explicitly excluded scanner’s internal directories from analysis scope: sonar.exclusions='...,**/.sonar/**,**/.scannerwork/**,cache/**'
    • Confirmed server-side cache is returning 404 Not Found for PR analyses, which is understood as expected behavior for transient branches.
  • The Core Paradox: “FILE_CHANGED” Despite Content Identity:
    • The sonar-scanner -X logs consistently show: INFO Miss the cache for X out of Y: FILE_CHANGED [X/Y] (for example, Miss the cache for 342 out of 342: FILE_CHANGED [342/342] for TS/JS files). DEBUG Cache strategy set to 'WRITE_ONLY' for file '...' as the current file is changed (repeated for all files).
    • Evidence of Byte-Identical Content:
      • We generated sha256sum for all non-Git files (find . -type f -not -path "*/.git/*") on two consecutive runs of the exact same commit/branch.
      • The diff -u of these sha256sum lists showed zero differences, proving the source file content is byte-identical between runs.
    • Evidence of Varying Metadata:
      • stat output for sample source files captured within the CI runner showed Modify (mtime) and Change (ctime) timestamps that differed between runs, reflecting the new git clone time. As is normal for git, to ensure this is not the cause we tried to update that.
      • Attempted Fix: mtime Normalization:
      • Implemented git ls-files -z | xargs -0 -I {} touch -m -t 197001010000.00 "${CI_PROJECT_DIR}/{}" in the CI script (after clone, before scanner) to normalize mtime to the Unix epoch.
      • Result: Analysis time still ~40-45 minutes. The sonar-scanner -X logs still report 0 out of X files leveraged and the FILE_CHANGED reason. ctime naturally continued to update with each touch operation.

Request for Help / Unresolved Questions

Despite proving our source code is byte-identical and normalizing mtime, SonarScanner continues to report FILE_CHANGED for all files. We have exhausted direct troubleshooting via logs and common solutions.

This points to SonarScanner’s file fingerprinting either being sensitive to:

  • ctime (which is always dynamic due to touch’s effect on metadata).
  • Other subtle, undocumented filesystem metadata (e.g., inode changes, extended attributes) specific to our Kubernetes executor environment.
  • A specific edge-case bug in SonarScanner CLI 7.1.0’s client-side cache validation logic under these conditions.

Are there any deeper internal logging flags, undocumented properties, or known workarounds for file fingerprinting sensitivity in client-side caching for sonar-scanner-cli 7.x, especially in ephemeral CI environments like Kubernetes executors?

Thank you for your time and any insights you can provide.

Hey there.

From the docs:

Do you see any successful cache download in the scanner logs? Of the target branch or the main branch?

  1. Before an analysis, the SonarScanner downloads from the server the corresponding cache:
  • For a branch analysis: the cache of the branch being analyzed.
  • For a pull request analysis: the cache of the target branch.
  • Or, as a fallback, the cache of the main branch.

Hi Colin,
Thanks for the response!
I do not, let me try home in on that as it was my understanding that server side cache is not used for PR analysis but I see the documentation states it should use the target or main branch cache and perhaps this can save the day on performance.
The client side cache should still be useful or is it recommended to not use client side cache? Because that part seems to never work no matter what I’ve tried.

1 Like

The client-side cache isn’t caching what you think it’s caching. The client-side cache is used so that analyzers aren’t redownloaded. The analysis cache is only stored server side.

Managed to fix the server side caching 404 issue:
13:16:30.619 INFO Server-side caching is enabled. The Java analyzer was able to leverage cached data from previous analyses for 0 out of 36379 files. These files will not be parsed.
Its now still analyzing all the files but in optimized analysis
13:46:57.613 INFO Optimized analysis for 36378 of 36379 files.
Which sadly means the total analysis is still just over 30mins
13:46:58.351 INFO Sensor JavaSensor [java] (done) | time=1839604ms

Hey @Pieter

I wonder if this could be linked to a known issue previously discussed here (and still open our side): Debugging server-side caching for quality gate - #13 by Marco_Kaufmann

In any case I find the logs weird.

Is caching working or isn’t it? :thinking:

I’ll toss this over to the devs to see if they have an idea.

It seems its working but only for some files, having rerun it now a couple of times the best I’m getting is a few files skipped which happened after disabling security sonar.security.enable=false.
05:52:31.725 INFO Server-side caching is enabled. The Java analyzer was able to leverage cached data from previous analyses for 352 out of 36379 files. These files will not be parsed.
PR analysis is still super slow, hard to proceed with the upgrade while the performance hit is this big. I’ll keep an eye on that thread you provided.

Also curious if for branch analysis seeing this is normal? It never seems to be the right context to use the server side cache

09:57:03.664 INFO  Server-side caching is enabled. The Java analyzer will not try to leverage data from a previous analysis.
09:57:03.680 INFO  Using ECJ batch to parse 36656 Main java source files with batch size 429 KB.
09:57:03.779 INFO  Starting batch processing.
09:57:04.500 INFO  The Java analyzer cannot skip unchanged files in this context. A full analysis is performed for all files.

Only PR analysis uses the cache. Not branch analysis.

From the docs:

The analysis cache mechanism is supported for the following languages:

  • To shorten a branch analysis: C, C++, Objective-C, and COBOL.
  • To shorten a pull request analysis: C, C++, Objective-C, Java, JavaScript, C#, VB.NET, TypeScript, Kotlin, PHP, and Python.

Hey Colin,
Any word from the devs about this, or ETA on that bug should it be whats affecting our sensor from not skipping more unchanged files.
For PRs as a workaround we only need coverage checks enforced, can I run the scanner with parameters to skip deep inspection of the java files, and then we’ll rely on the branch analysis on scheduled scan to raise issues until unchanged files can be skipped successfully?

Nope! Sometimes things just take time.

It’s not possible to disable rules just for PR analysis.

Hey @Pieter,

Thanks for sharing all these details. To get a better a idea of what is going on, you may want to run the analysis with debug/verbose logs enabled.. The Java Sensor should log (a lot) more information about which files have been skipped and why. Feel free to share that with us so that we can narrow down where the issue is.

Otherwise, to get the best performance from PR analysis, there are a couple of things you can check.

  1. Is the PR branch in sync with the base branch? The analysis cache used by the PR is the one generated during the base branch analysis. A mismatch here will make it difficult to get good performance.
  2. For the sources that have not changed, is the bytecode fed to the analysis identical? Our Java analysis may choose not to re-use the cached results in case of a mismatch as it might indicate some changes in semantics.
  3. (Optionally) Are you using the most appropriate scanner for your code? If you are using Gradle or Maven as your build system, consider switching your scanner to their dedicated plugins (Maven or Gradle). They may not help much with your issues but they should make configuration a little easier.

Please let us know how your investigation progresses.


Dorian

Hey Dorian,
Thanks for looking into this.
I would love to share the full debug log but its huge, over 300MB, and since it contains a lot of repo info for our product not sure I’m comfortable adding it here as an attachment, any way to share it directly with you / SonarQube? I don’t see specific reasons for why files are skipped though, my debug log is collected with -X on sonar-scanner cli, is this sufficient?

  1. Yes it is, freshly created branch, and in my testing not even containing code changes.
  2. Indeed its byte identical, in the Gitlab job at one point to debug I had a recursive checksum collected for all files in the build directory, and concluded over multiple runs when comparing there are no changes to the files outside of timestamps which is set to the git clone time.
  3. Arguably no, I’m using the SonarScanner CLI 7.1.0.4889, I’ve been contemplating switching to the Gradle plugin, and will do if you think this will make an impact. This is how it was on the previous version so in the upgrade testing I wasn’t feeling like changing too many things.

Look forward to your response.

Hey Pieter,
Thanks for answering these questions, we should now look into what the logs can tell us.

I don’t see specific reasons for why files are skipped though, my debug log is collected with -X on sonar-scanner cli, is this sufficient?

Before trying to rerun the analysis with new parameters, can you spot lines like the one below in your analysis?

Scan without parsing of file <a filename here> failed for scanner <a list of checks here>.

If you don’t see such lines, try running the analysis again with the verbosity parameter I linked above.

PS: It may seem like a redundant question but I am going to ask to make sure we both don’t miss the obvious: The base branch has been analyzed before, correct?

Hi Dorian,
No such log entries I’m afraid, I’ve got

    -Dsonar.log.level=TRACE 
    -Dsonar.verbose=true 

both passed to the scanner and can see DEBUG log entries along with

12:32:29.549 WARN  Property 'sonar.verbose' with value 'true' is overridden with value 'true'

in my logs but no entries matching those words.

I’m seeing many of these Incomplete Semantic errors as well, but I’m unclear why cause I do pass -Dsonar.java.libraries="**/build/libs/*.jar"

12:34:13.692 INFO  Server-side caching is enabled. The Java analyzer was able to leverage cached data from previous analyses for 352 out of 36379 files. These files will not be parsed.
12:34:13.710 INFO  Using ECJ batch to parse 36027 Main java source files with batch size 429 KB.
12:34:13.816 INFO  Starting batch processing.
12:34:15.546 DEBUG [SE] Loaded 255 hardcoded method behaviors.
12:34:15.621 DEBUG Incomplete Semantic, unknown parameter type 'DaysInYearMethodProvider' line 20 col 9
...
12:47:12.073 INFO  100% analyzed
12:47:12.073 INFO  Batch processing: Done.
12:47:12.073 INFO  Optimized analysis for 36026 of 36027 files.

Worth mentioning the above run was with code smell rules removed from the quality profile as I noticed thats what increased the scan times enormously, trying to figure out if its just the bulk of code smell rules or particular rules causing the slow down.

Oh and yes, there’s been a fresh analysis on the default branch. I’ve executed the branch analysis many times during the testing.