We have a small to medium size repository and we run Sonar as part of our PR pipelines to help ensure higher code quality in our product. Unfortunately while we have been able to greatly improve our pipeline times by running unit tests and various linters in parallel, the Sonar stage is now the bottleneck since it runs the analysis sequentially. We monitored the CPU load on the pod running sonar and it is pretty consistently using 1.5 CPU or 2.5 CPUs (we use k8s), we’ve allocated 6CPUs to the pod, but Sonar-scanner is not taking advantage of the increased compute capacity.
As far as we can tell the scanner is running the analysis entirely sequentially, while it could at the very least analyze each language in parallel, greatly reducing the time required to analyze monorepos with multiple languages.
Note that we are already excluding generated code and we are looking at finding ways to tell Sonar to only look at changed files for Pull Requests but that will not help with the main and release branches analysis times where we need full reports for audit purposes.
for some languages, this capability already exists. For instance, if you’re doing C, C++, or Objective-C, it is documented how to achieve multithreaded scan. Which language are you analyzing in your repos?
Our 3 main languages are Go, Python and TypeScript. The rest is negligible.
There 3 do not seem to support multi-threading, and it seems to be something to be implemented per plugin. It might still be possible to allow multiple plugins to run in parallel threads and gain overall speed for mono repos with multiple languages.
Hi @sodul, our teams did some POCs in the past to see what we could get from running things in parallel. The outcome was not interesting enough to move into that direction. Maybe we’ll give it a try once again in the future - but nothing planned for now.
@Fabrice_Bellingard the trend nowadays is no longer to have faster CPUs clock speeds but more cores and where disk io is significantly faster.
If you did the tests when the number of cores was limited to just a few and where you used classic hard drives, I can understand that it would not have helped much. While I understand that doing it in parallel within the same language, like you allow for C, might be difficult I really beleive you should revisit the current way of processing each and every language sequentially.
I strongly believe that it would be relatively uncomplicated to analyze Python code and TS code in parallel in separate cores and the entire analysis should take significantly less on modern hardware with many cores and SSDs.
Analysis speed is a bottleneck for my team as well. We have 8 CPUs on build machines, but it looks like sonar is using only one. We are thinking to switch to something else like error-prone, as sonar phase of our build already takes more than everything else (we are utilizing parallel builds and caching in gradle heavily).
I totally agree with you that nowadays it’s possible to have build machines with a lot of cores and they are not used by our analyzers. We are currently exploring options to use them more, to not re-analyze files that did not change compared to the previous analysis, or to just analyze what is part of a pull request.
Some of these items are visible in SonarQube’s Portal under the Speed of Analysis section.
I expect significant changes in that domain in 2022, so I hope you will be still using SonarQube to see them.
That said, I would be very interested to have more details about your particular case. Can you create a dedicated topic here and share your context (machine, OS, CPU, RAM, CI, …), the languages scanned, the number of LOCs of your project, the current analysis duration, and the acceptable time you would dream of.