Sonar Scanning slower than expected on multi-core environment

Template for a good new topic, formatted with Markdown:

  • ALM used: GitHub.com
  • CI system used: Jenkins on EKS (k8s) in AWS.
  • Scanner command:
SONAR_SCANNER_OPTS='-Xms2816m -Xmx2816m -XX:+UseG1GC -XX:+CMSClassUnloadingEnabled -XX:+UseCompressedClassPointers -XshowSettings:vm'
sonar-scanner -Dsonar.branch.name=main
  • Languages: Go, Python, TypeScript, CloudFormation templates, Terraform.
  • Private repository.
  • Error observed: None
  • Host OS: Ubuntu for EKS
  • Host hardware: EC2 c5.4xlarge (16vCPUs, 32GB RAM)
  • Disk: EBS GP3.
  • Container OS: Ubuntu 20.04
  • Container specs: 6GB RAM, 4 vCPU.

I filed this as instructed here: Allow sonar-scanner to analyse files in parallel - #11 by Alexandre_Gigleux

We run our CI pipelines through Jenkins which is fully containerized and autoscales on EKS. For our PRs one of the slower stages is to run Sonar which routinely take 7-8 minutes for Sonar alone (not counting the spinning up of the pod, cloning the repo, etc…).

As we have multiple languages in the repo, we already have some logic to skip a target language if not files impacting it are changed (e.g. we skip Javascript if only go files have changed and so on), but we always do a full analysis for every commit to our main branch.
Other optimizations we have is that we cache the sonar plugins on S3 and ‘install’ them before every run to help avoid unnecessary re-downloads from sonarcloud.io.

Ideally Sonar should take advantage of the multiple cores that are available for analysis. For example there is no reason that the go code and the python code could not be analyzed in parallel. The disk io overhead should be minimal with modern SSD and all the source files fit easily in a disk cache.

An other issue is that since we work with ephemeral containers, we have to git clone on each run of the pipeline, to makes the clones as small as possible we perform a shallow clone. Unfortunately Sonar’s Blame feature does not work well with shallow clone, even if the historical blame information is already registered. In order to fill in the blame information we are required to unshallow our git clones just before calling the scanner. This adds extra time to the stage that we think could be avoided.

2 Likes

Hello @sodul,

Thanks for having created this dedicated thread.

Being able to run different language analyzers in parallel is an option we have in our backlog. Before doing that, we want to optimize the raw performance of each analyzers and do incremental analysis on PR (ie: just analyze what has been changed).

For the languages you are using, a enhancement will be deployed in the coming weeks on SonarCloud to analyze only changed files for Go, CloudFormation and Terraform on PR. For Python and TypeScript, this is part of our 2022 objectives to be able to do the same.

  1. Can you share the number of LOCs and number of files you have for each of the 5 languages in your repo?
  2. Can you share privately the logs of a full scan of your repo?

Thanks
Alex

1 Like

I’ve sent the details in private. Thank you.

1 Like