Sonar Scanning slower than expected on multi-core environment

sodul · February 16, 2022, 11:26pm

Template for a good new topic, formatted with Markdown:

ALM used: GitHub.com
CI system used: Jenkins on EKS (k8s) in AWS.
Scanner command:

SONAR_SCANNER_OPTS='-Xms2816m -Xmx2816m -XX:+UseG1GC -XX:+CMSClassUnloadingEnabled -XX:+UseCompressedClassPointers -XshowSettings:vm'
sonar-scanner -Dsonar.branch.name=main

Languages: Go, Python, TypeScript, CloudFormation templates, Terraform.
Private repository.
Error observed: None
Host OS: Ubuntu for EKS
Host hardware: EC2 c5.4xlarge (16vCPUs, 32GB RAM)
Disk: EBS GP3.
Container OS: Ubuntu 20.04
Container specs: 6GB RAM, 4 vCPU.

I filed this as instructed here: Allow sonar-scanner to analyse files in parallel - #11 by Alexandre_Gigleux

We run our CI pipelines through Jenkins which is fully containerized and autoscales on EKS. For our PRs one of the slower stages is to run Sonar which routinely take 7-8 minutes for Sonar alone (not counting the spinning up of the pod, cloning the repo, etc…).

As we have multiple languages in the repo, we already have some logic to skip a target language if not files impacting it are changed (e.g. we skip Javascript if only go files have changed and so on), but we always do a full analysis for every commit to our main branch.
Other optimizations we have is that we cache the sonar plugins on S3 and ‘install’ them before every run to help avoid unnecessary re-downloads from sonarcloud.io.

Ideally Sonar should take advantage of the multiple cores that are available for analysis. For example there is no reason that the go code and the python code could not be analyzed in parallel. The disk io overhead should be minimal with modern SSD and all the source files fit easily in a disk cache.

An other issue is that since we work with ephemeral containers, we have to git clone on each run of the pipeline, to makes the clones as small as possible we perform a shallow clone. Unfortunately Sonar’s Blame feature does not work well with shallow clone, even if the historical blame information is already registered. In order to fill in the blame information we are required to unshallow our git clones just before calling the scanner. This adds extra time to the stage that we think could be avoided.

Alexandre_Gigleux · February 17, 2022, 8:49am

Hello @sodul,

Thanks for having created this dedicated thread.

Being able to run different language analyzers in parallel is an option we have in our backlog. Before doing that, we want to optimize the raw performance of each analyzers and do incremental analysis on PR (ie: just analyze what has been changed).

For the languages you are using, a enhancement will be deployed in the coming weeks on SonarCloud to analyze only changed files for Go, CloudFormation and Terraform on PR. For Python and TypeScript, this is part of our 2022 objectives to be able to do the same.

Can you share the number of LOCs and number of files you have for each of the 5 languages in your repo?
Can you share privately the logs of a full scan of your repo?

Thanks
Alex

sodul · February 17, 2022, 10:05am

I’ve sent the details in private. Thank you.

egmacke · August 30, 2022, 12:18pm

Hi @Alexandre_Gigleux

Just wondering if there’s any update on the ability to run incremental analysis for Typescript. This is something that would be incredibly useful for us as our project is almost entirely TS and takes ~7mins (on our Github runners) to run analysis. (On a project of about 120k LOC)

It would be good to get an idea of when this might become available, as well as how it can be configured once it is?

Kind regards

Ed

Alexandre_Gigleux · October 6, 2022, 2:59pm

Hello @egmacke,

This is available on SonarCloud (see: Faster JS/TS Pull Request Analysis) and will be part of SonarQube DE+ 9.7 (around mid-October).

Alex