Incremental analysis for Python and JavaScript projects

Analyzing a Python project with SonarCloud generates the following output:

INFO: The Python analyzer was able to leverage cached data from previous analyses for 0 out of 147 files. These files were not parsed.

How can I leverage cached data to speed up the analysis?

The same project contains a single JavaScript file that takes a relatively long time to analyze but is rarely changed. The analysis generates the following output:

INFO: Hit the cache for 0 out of 1
INFO: Miss the cache for 1 out of 1: ANALYSIS_MODE_INELIGIBLE [1/1]

How can I leverage the cache for the JavaScript file?

Hi,

Welcome to the community!

Is this your first analysis? It’s possible you got this message because the cache hadn’t been built yet…

 
Ann

Thank you for your reply!

Sorry, I should have been more specific. The analysis is run inside a docker container on a build server. I have already mounted ~/.sonar/cache into the docker container which speeds up loading/downloading of plugins. However, it seems that that folder only contains plugins and no analysis results. Where are those cached?

Hi,

In fact, incremental analysis is enabled for JavaScript on SonarCloud, but I don’t believe it’s available yet for Python. Sorry, I had forgotten that & had to do some digging.

I would expect it to be available for Python “soon”.

 
Ann

Hello,

I’m jumping on this topic to clarify.

Incremental analysis is available for all languages but not yet for C++ and C# (coming in the coming weeks, max end of March). It works only for Pull Request analyses. Branch analyses are still doing a full scan each time you push on the branch. The announcement will be published later today.
So it works for JavaScript, TypeScript, and Python and you have nothing to do to enable it. This works with a cache on the server side, there is nothing to change in your config on the scanner side to enable it.

Some INFO logs are more for us to understand what’s going on in case we received logs from users and we already identified that some logs are misleading for Python. The fix should be deployed soon.

The only thing you can check is to have this enabled in Administration > General Settings:

image

@Mr-Pepe
Do you see your Pull Request analyses running faster than a month ago?

Alex

Thank you for your reply!

The analysis of that particular project is running faster now compared to a month ago. However, that seems mostly due to a faster JavaScript analysis (single JavaScript file in an otherwise Python-only project). The JavaScript analysis now takes 5197ms instead of 41943ms. The Python analysis now takes 6633ms instead of 7863ms.

The newer (faster) pull request did not even change any Python files that SonarCloud cares about (not in sonar.sources). Does the Python sensor simply need that time even if all files can be retrieved from cache? Are many network calls made to determine which files have to be checked?

I am generally looking into ways to speed up our pipelines and parallelized a lot of steps. However, SonarCloud has to be run sequentially after other steps because it reads in test results. This adds 30 to 60 seconds to each pipeline run of a pull request.

These are the timings for an example project:

INFO: Load global settings (done) | time=248ms
INFO: Load plugins index (done) | time=257ms
INFO: Load/download plugins (done) | time=469ms
INFO: Load project settings for component key: 'vorausrobotik_voraus-vtest' (done) | time=200ms
INFO: Execute project builders (done) | time=1ms
INFO: Load project branches (done) | time=201ms
INFO: Check ALM binding of project 'vorausrobotik_voraus-vtest' (done) | time=183ms
INFO: Load project pull requests (done) | time=225ms
INFO: Load branch configuration (done) | time=884ms
INFO: Load quality profiles (done) | time=385ms
INFO: Load active rules (done) | time=3128ms
INFO: Load project repositories (done) | time=304ms
INFO: SCM collecting changed files in the branch (done) | time=251ms
INFO: Load metrics repository (done) | time=189ms
INFO: Load sensor cache (404) | time=337ms
INFO: Sensor IaC CloudFormation Sensor [iac] (done) | time=17ms
INFO: Sensor IaC Kubernetes Sensor [iac] (done) | time=5ms
INFO: Sensor C# Project Type Information [csharp] (done) | time=1ms
INFO: Sensor C# Analysis Log [csharp] (done) | time=14ms
INFO: Sensor C# Properties [csharp] (done) | time=0ms
INFO: Sensor HTML [web] (done) | time=5ms
INFO: Sensor XML Sensor [xml] (done) | time=1ms
INFO: Sensor TextAndSecretsSensor [text] (done) | time=128ms
INFO: Sensor VB.NET Project Type Information [vbnet] (done) | time=1ms
INFO: Sensor VB.NET Analysis Log [vbnet] (done) | time=14ms
INFO: Sensor VB.NET Properties [vbnet] (done) | time=0ms
INFO: Sensor Python Sensor [python] (done) | time=6633ms
INFO: Sensor Cobertura Sensor for Python coverage [python] (done) | time=1971ms
INFO: Sensor PythonXUnitSensor [python] (done) | time=1771ms
INFO: Sensor JaCoCo XML Report Importer [jacoco] (done) | time=2ms
INFO: Sensor JavaScript analysis [javascript] (done) | time=5197ms
INFO: Sensor TypeScript analysis [javascript] (done) | time=1ms
INFO: Sensor CSS Rules [javascript] (done) | time=0ms
INFO: Sensor CSS Metrics [javascript] (done) | time=0ms
INFO: Sensor ThymeLeaf template sensor [securityjavafrontend] (done) | time=1ms
INFO: Sensor Python HTML templates processing [securitypythonfrontend] (done) | time=20ms
INFO: Sensor IaC Docker Sensor [iac] (done) | time=61ms
INFO: Sensor Serverless configuration file sensor [security] (done) | time=3ms
INFO: Sensor AWS SAM template file sensor [security] (done) | time=1ms
INFO: Sensor AWS SAM Inline template file sensor [security] (done) | time=1ms
INFO: Sensor javabugs [dbd] (done) | time=1ms
INFO: Sensor pythonbugs [dbd] (done) | time=430ms
INFO: Sensor JavaSecuritySensor [security] (done) | time=4ms
INFO: Sensor CSharpSecuritySensor [security] (done) | time=0ms
INFO: Sensor PhpSecuritySensor [security] (done) | time=1ms
INFO: Sensor PythonSecuritySensor [security] (done) | time=952ms
INFO: Sensor JsSecuritySensor [security] (done) | time=1249ms
INFO: Sensor Analysis Warnings import [csharp] (done) | time=1ms
INFO: Sensor Zero Coverage Sensor (done) | time=4ms
INFO: CPD Executor CPD calculation finished (done) | time=28ms
INFO: SCM writing changed lines (done) | time=7ms
INFO: Analysis report generated in 185ms, dir size=281 KB
INFO: Analysis report compressed in 144ms, zip size=131 KB
INFO: Analysis report uploaded in 353ms
INFO: Time spent writing ucfgs 14ms
INFO: Total time: 38.757s

39 seconds is not an extremely long time but it makes up a significant part of an otherwise well-optimized pipeline. It seems that incremental analyses are already enabled, so where else could I realize performance improvements? Can sensors be executed in parallel? Can the analysis be split into parts? The loading of plugins and rules could be performed in parallel with other pipeline steps and the sensor execution could be executed later. Actually, only the sensors that read in test results (e.g., PythonXUnitSensor and Cobertura) have to be deferred.

Last question: Why is caching only enabled for PR builds? Will it become available for branch builds in the future?

Hello,

The incremental analysis on PR is just the beginning of a long journey and we definitely want to enable incremental analysis on branches in the future. We did it first on PR because we believe it’s where developers will get the bigger benefit today while it’s kind of OK to wait for branch analysis. We had to do a choice between PR and Branch, and PR won.

The good news is that we have ideas for improvements. Our goal is to reach the situation to be very very fast (less than 10 seconds) when there is a PR done on a language/file that we don’t support. This should be the case in a “no change” scenario.
Today, we retrieve the cached data for the security analysis in this line even if there is no need to retrieve them:

Sensor Python Sensor [python] (done) | time=6633ms

This is why you see time spent on this sensor while you changed no Python files. We already identified this as a potential source of time gain.

We can also expect some time gain in this step:

Load active rules (done) | time=3128ms

Potentially in the next coming months, on your example, you could expect a gain of 10-15 seconds.

Meanwhile, there is not much you can do on your side. The only limiting factor will be the speed of your storage because of the number of I/O we do. If you can afford a fast SSD, that could help to reach your own goal.

Alex

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.