CFamily Analysis Cache Questions

Hi there,

I have a few questions that I was hoping you could clarify around how the CFamily analysis cache works in certain scenarios. These are largely driven by a desire to understand the specifics of why we do not get cache hits all the time.

For reference we are running SonarQube 9.6 Enterprise edition.

Consider the scenario where we have a C++ project that is currently analysed in SonarQube. If I create a new branch and modify a header file, upon raising a (GitHub) PR the analysis of that project receives no cache hits:

14:13:48   14:13:48.514 INFO: Loading cache from: server
...
14:13:49   14:13:49.193 DEBUG: Cache miss for Z:\buildAgent\work\50d8454fc8ee3b4\base\src\stdafx.cpp: dependencies changed; the first detected difference is file hash change: Z:\buildAgent\work\50d8454fc8ee3b4\base\src\Main\ORHAddress.h
14:13:49   14:13:49.193 DEBUG: Cache miss for Z:\buildAgent\work\50d8454fc8ee3b4\base\src\stdafx.cpp: dependencies changed; the first detected difference is file hash change: Z:\buildAgent\work\50d8454fc8ee3b4\base\src\Main\ORHAddress.h
14:13:49   14:13:49.196 INFO: [pool-3-thread-1] Z:/buildAgent/work/50d8454fc8ee3b4/base/src/stdafx.cpp
14:13:49   14:13:49.225 DEBUG: Cache miss for Z:\buildAgent\work\50d8454fc8ee3b4\base\src\IOSPlusBase.cpp: dependencies changed; the first detected difference is file hash change: Z:\buildAgent\work\50d8454fc8ee3b4\base\src\Main\ORHAddress.h
14:13:49   14:13:49.225 INFO: [pool-3-thread-2] Z:/buildAgent/work/50d8454fc8ee3b4/base/src/IOSPlusBase.cpp
14:13:49   14:13:49.247 DEBUG: Cache miss for Z:\buildAgent\work\50d8454fc8ee3b4\base\src\Main\AssertReport.cpp: dependencies changed; the first detected difference is file hash change: Z:\buildAgent\work\50d8454fc8ee3b4\base\src\Main\ORHAddress.h
14:13:49   14:13:49.247 INFO: [pool-3-thread-3] Z:/buildAgent/work/50d8454fc8ee3b4/base/src/Main/AssertReport.cpp
14:13:49   14:13:49.278 DEBUG: Cache miss for Z:\buildAgent\work\50d8454fc8ee3b4\base\src\Main\CpuUtil.cpp: dependencies changed; the first detected difference is file hash change: Z:\buildAgent\work\50d8454fc8ee3b4\base\src\Main\ORHAddress.h
14:13:49   14:13:49.278 INFO: [pool-3-thread-4] Z:/buildAgent/work/50d8454fc8ee3b4/base/src/Main/CpuUtil.cpp
14:13:49   14:13:49.307 DEBUG: Cache miss for Z:\buildAgent\work\50d8454fc8ee3b4\base\src\Main\Localisation.cpp: dependencies changed; the first detected difference is file hash change: Z:\buildAgent\work\50d8454fc8ee3b4\base\src\Main\ORHAddress.h
14:13:49   14:13:49.307 INFO: [pool-3-thread-5] Z:/buildAgent/work/50d8454fc8ee3b4/base/src/Main/Localisation.cpp
...
14:17:47   14:17:47.612 INFO: Cache: 0/63 hits, 1318068 bytes
...

In this scenario we can see it’s because the dependencies i.e. header have changed. Can I confirm if the cache miss for an individual file applies based on the header being a dependency in the AST e.g. it’s included either directly or via another header (possibly a PCH), or does the fact that one file has changed invalidate the entire cache i.e. for all files?

Also, we can see the analysis has loaded the cache from the server; however what logic is used to determine what cache this is e.g. it is the cache of the last analysis of the PR’s base branch e.g. main|develop etc. If so is it always the latest or does it relate to a specific git commit i.e. the one the head branch of the PR was branched from?

Finally, in what scenarios does a new analysis cache get uploaded to the server? For example if we re-run the above PR analysis it continues to get 0 cache hits, implying that, for PRs at least, the analysis cache is not uploaded to the server? If so, would that seem like a logical optimisation as it is quite common for a PR to have additional commits required as a result of reviews etc. and that in turn will trigger another analysis which could benefit from the cache of the prior analysis?

Thanks is advance!

Sam

Hello @Sam_Anthonisz,

You are right. This is one reason for cache entry invalidation. Another reason is change in the compilation command. You can think of it this way: if the compiler thinks that a source file should be recompiled then it is a cache miss.

The main branch updates the cache and it is the one that is used. PR doesn’t update the cache. there is one entry per source file and it comes from the latest analysis of the main branch.

No, because the cache isn’t updated from PR.

This doesn’t take care of the case where multiple PR are running in parallel. Currently, the default cache is simple and covers simple cases (easy win).

For more complicated cases, you can control CFamily cache by using explicitly the filesystem cache.
This way you can decide when to update the cache in a way that is tailored to your workflow. Also, You can have multiple filesystem caches (a cache per PR).

I hope it is clearer now!

Thanks,

Hi @Abbas_Sabra,

Thanks for the prompt reply!

Yes that does clear things up a bit and explains the results we are seeing in terms of cache hits.

For your reference we actually used to manage the cache on the filesystem ourselves prior to upgrading to 9.6; given your feedback I expect we’ll revert to our original setup and it seems to better match our workflows.

Can I suggest that you maybe look at updating the documentation for the analysis cache usage to be a bit more detailed in respect to why people may want to use the filesystem cache. Currently it is quite brief and implies there’s only a single reason for its usage:

You should consider changing the cache storage to the filesystem only when the cache size becomes a concern and you prefer to optimize its storage location.

Thanks again,

Sam

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Hi @Sam_Anthonisz,

Thanks for the valuable feedback, We will improve the documentation.

Thanks,