Performance guide for large project analysis

Hey Sonar Community!

Analyzing large projects can take super long. I wonder what are your experiences, tips on improving performance when doing a full analysis on an x-large Java project.

Is 20 minutes of analysis time for ~500k LoC expected ? What are actions I can take to decrease the time requirement?

I did check to exclude any generated code, only analyse what is really needed. Heap size is increased for Sonar.

A performance guide in the User Guide would be great, I couldn’t find such a thing: https://docs.sonarqube.org/display/SONAR/User+Guide
As an example, I really like the Gradle guide on performance improvement: https://guides.gradle.org/performance/

Thanks and cheers,
Balázs

6 Likes

I can’t tell you how to make it faster, but I can share some experience.

We need a bit more than 3 hours to go through 5 million lines of code so 20 minutes doesn’t sound so bad :wink:

It all depends on how you want to integrate Sonar in your work. We told the programmers that we will only analyze their code at night so they know that they will have fresh results when they come in the morning. I think that most have gotten used to that.

We also use the Sonarlint plugin in Eclipse to give them direct feedback on some of the more obvious issues.

Thanks for your input Philippe!
Good point regarding SonarLint, the continuous feedback during development is game changer IMHO.

1 Like

Hi,

This will be very hard to investigate whitout more details.
I had a look on some project we analyze internally and I can see a 3MLoc project analyzed in ~2h and a 1Mloc in ~40min (with all rules activated) so your numbers a bit high but not out of the extraordinary.

Upgrading to latest versions of SonarQube and analyzer plugins may also to benefit from performance improvements.

Simply to provide another data point:

We have about three million lines of code, it takes the SonarScanner (Version 3.0.3.778)

  • ~40min for sources only
  • ~1h 35 min with external libraries
  • ~33h with external libraries and all binaries

with 382 active rules; some of them custom. Project analysis by the SonarQube (Version 6.72) server always takes approximately 40 minutes.
I cannot eliminate the possibility that we configured the binaries incorrecty, though.

Thanks a lot for all your inputs!

Is an incremental analysis maybe supported/planned?
Assuming I have Gradle project, and want to analyse the code on every project in my CI. Sonar already knows my sourcecode, I imagine it could store a hash and easily check if certain modules have been changed or not. On big projects with small changes this would be a dramatic performance difference. Could this work?

Just to let you know that “incremental mode” can mean a lot of different things : we thought about this and did some experiment that brought their share of complexity.
As of today we have no clear plan about such a feature.

Nice thread. At this stage though I’d like to point out that any comparison that restricts itself to LOC versus analysis_time is likely to be unreliable, and not give any interesting outcome. Couple of reasons why I’m pointing that out, all around the fact that a SonarQube analysis of a project involves some many components/factors:

  • obviously the versions of SonarQube and of analyzers, as performance is continuously improved
  • the actual server-side configuration: which rules are activated in the Quality Profile ? is duplication detection enabled ? etc.
  • the number of extensions installed in the environment: custom plugins ? coverage import ? etc.

All of those factors contribute to the overall analysis and can make a difference in terms of timing. So whenever looking at performance aspects I would suggest to take a pragmatic approach.

Understand what is taking time

The analysis is made of the client-side scanner run, and the server-side Background Task. Understanding long execution varies depend on what takes time:

  • client-side scanner run: enable debug logs (sonar.verbose=true) with timestamps , and nail-down the piece which takes time. If it’s the actual code analyzer doing its job, than that part can indeed grow with the volume of the codebase
  • server-side background task: check the state of resources (CPU/RAM/IO), see if database interactions aren’t slow for some reason etc. Verbose logs can also help narrow-down the lengthy part.

Monitor

Whatever the context, the minute one starts to look into performance, than monitoring comes in pair. There’s a good initial guide in the documentation. And ultimately those are pure monitoring/operational considerations for a Java application, i.e. first understand whether your performance feeling relates to system performance (CPU/RAM/IO), or application performance (product itself, but also interaction with other components like database),

4 Likes

The risks are higher than the (potential) gains. The output of the analysis is not only a factor of the code itself, it also depends on which rules are enabled in the Quality Profile, what are the analysis settings etc etc. Even if code doesn’t change, then the analysis report can greatly vary if rules/settings/parameters were modified.

Also never forget that even if a file did not change, there could still be a new issue in that file: think of a function now deprecated in fileA , if fileB uses that method then it would raise a do not use deprecated method issue, even if fileB did not change. That’s the basic example, only to illustrate that (on top of the influence of server-side project settings) focusing on changed files is also not enough for any advanced analyzer that does some detailed/cross-file inspection (which many SonarAnalyzers do).

1 Like

3 posts were split to a new topic: Troubleshooting lengthy Java analysis

Thanks a lot for your answers, great points!

In general the server side background task seems to be fast enough, majority of time is spent on client side. I’ll check the logging options to look into that: https://docs.sonarqube.org/display/SONAR/Analysis+Parameters#AnalysisParameters-AnalysisLogging