TLDR:
How accurate will the analysis be if I exclude (some) headers?
For a historical reason, our project consists of multiple GitHub repositories (say ‘core’, ‘main’, and few others).
Every time there’s a change in the ‘main’, we (on Travis) checkout all repositories, and build them together. After that, we run the sonar scanner on the code.
Since it’s altogether quite a lot of work, our build jobs tend to timeout when there’s a ‘serious’ change, either in the ‘core’ (ccache miss) or when there’s e. g. a new version of the sonar plugin (cache invalidated).
One of the solutions we consider is to skip analysis of the ‘core’ (i. e. do it in its own repository). Since ‘core’ contains headers included in almost all source files, what impact will this have on the static analysis?
May I assume that it might not be as accurate, as it won’t be able to track possible issues to those headers?
We analyze translation units that are compiled by the command passed to build-wrapper. Since the translation unit contains all the included header files, analyzing the translation units leads to analyzing the header files. So excluding header files, unless you are explicitly compiling them independently of the source files, will not have any impact on the analysis time/accuracy.
Even if you exclude these “core” headers, they are still considered as dependencies to the source files that include them and if you modify these dependencies the previous result in the cache will no longer be valid/accurate => cache miss.
In short, it should not impact the result and it should not solve the cache invalidation problem.
Theoretically, the general recommendation to this problem is not to include files that change often in every source files. This is usually done by breaking these headers into smaller ones or by using the “PImpl” technique.
In one of my previous questions (Sonarscanner cache not working properly - #2 by mpaladin) I have been told that if I exclude some files (headers), and then introduce them again, all dependent files need to be reanalyzed to (properly) find the header issues.
From my understanding, this goes against what you have told me now. In this case, however, the situation is slightly different, as I don’t want to exclude them, I will just not include them in the first place.
No. Let me clarify. Here we are discussing two independent reasons for a cache miss.
Cache miss due to a change in any dependency file:
We analyze the compiled translation units. All the files that end up in the translation unit are considered as its dependencies. Modifying any of them will retrigger the analysis of this translation unit independently if they are excluded/included in your scanning scope. They don’t even need to be in the scanner directory. Modifying, for example, an STL header will invalidate all the translation unit that includes that header. Here think of it the same way you think about compilation. If the modification retriggers the compilation of a translation unit it will definitely retrigger the analysis of that translation unit => a cache miss.
Cache miss due to a change in scanner exclusion option:
This is another reason for a cache miss and it’s independent of what I explained previously.
The issues on excluded files are simply not saved in the cache. So when you introduce these files we have to re-generate their analysis results. We do that by re-analyzing all the files that are dependent on them. Mainly re-analyze all the translation units that they end up in. Note that this should be rare as we don’t expect the excluded files to be changed regularly.
In your case, I said exclusion/inclusion won’t matter because of the first scenario.
modifying these files independently of all the scanners options will lead to a cache miss because you are modifying a dependency.
Your reply didn’t make sense to me as it will not solve the cache miss when you modify a header that is included everywhere.
I (think I) understand the reasons for cache misses that I get (notice that I was originally talking about both Ccache miss [which leads to scanner cache miss], as well as the sonar scanner cache miss [due to e.g. newer version of the plugin]). However, that was not my question.
What wasn’t clear to me is the implication of excluding / not including a specific folder with some headers to the analysis accuracy / performance.
Based on the info I had from earlier, I expected / assumed the analysis to be faster, albeit less accurate.
This fully explains the behavior I was observing.
What I was trying to say was ‘OK, before I was excluding files, which led to sonar cache miss [now I know why]. Will there be any difference if I don’t include them, instead of explicitly excluding them?’ [no difference, they will be analyzed anyway, as they are dependency].