Allow sonar-scanner to analyse files in parallel

We have a small to medium size repository and we run Sonar as part of our PR pipelines to help ensure higher code quality in our product. Unfortunately while we have been able to greatly improve our pipeline times by running unit tests and various linters in parallel, the Sonar stage is now the bottleneck since it runs the analysis sequentially. We monitored the CPU load on the pod running sonar and it is pretty consistently using 1.5 CPU or 2.5 CPUs (we use k8s), we’ve allocated 6CPUs to the pod, but Sonar-scanner is not taking advantage of the increased compute capacity.

As far as we can tell the scanner is running the analysis entirely sequentially, while it could at the very least analyze each language in parallel, greatly reducing the time required to analyze monorepos with multiple languages.

Note that we are already excluding generated code and we are looking at finding ways to tell Sonar to only look at changed files for Pull Requests but that will not help with the main and release branches analysis times where we need full reports for audit purposes.

Hi @sodul,

for some languages, this capability already exists. For instance, if you’re doing C, C++, or Objective-C, it is documented how to achieve multithreaded scan. Which language are you analyzing in your repos?

Hi @Fabrice_Bellingard,

Our 3 main languages are Go, Python and TypeScript. The rest is negligible.

There 3 do not seem to support multi-threading, and it seems to be something to be implemented per plugin. It might still be possible to allow multiple plugins to run in parallel threads and gain overall speed for mono repos with multiple languages.

1 Like

Hi @sodul, our teams did some POCs in the past to see what we could get from running things in parallel. The outcome was not interesting enough to move into that direction. Maybe we’ll give it a try once again in the future - but nothing planned for now.

1 Like

@Fabrice_Bellingard the trend nowadays is no longer to have faster CPUs clock speeds but more cores and where disk io is significantly faster.
If you did the tests when the number of cores was limited to just a few and where you used classic hard drives, I can understand that it would not have helped much. While I understand that doing it in parallel within the same language, like you allow for C, might be difficult I really beleive you should revisit the current way of processing each and every language sequentially.
I strongly believe that it would be relatively uncomplicated to analyze Python code and TS code in parallel in separate cores and the entire analysis should take significantly less on modern hardware with many cores and SSDs.

1 Like