Analysis of 2M lines of Python code takes 14 hours

Hi,

We’re evaluating the usage of SonarCloud in our company. We’re running an analysis on our project that has ~2M lines of Python code and it takes ~14 hours to complete. We’re using standard quality gates and quality profiles. Is it expected to take that long?

Hi @roman_p,

That’s not expected actually. Do you have access to the scanner log? Could you please share it?
It would interesting to see the following sensor times:

Sensor Python Sensor [python] (done) | time=???ms
Sensor PythonSecuritySensor [security] (done) | time=???ms

Hi @Andrea_Guarino ,

Yes I have the log file

Sensor Python Sensor [python] (done) | time=50849027ms
Sensor PythonSecuritySensor [security] (done) | time=236406ms

1 Like

Thanks for the info.
Can you share the code you’re analyzing? is it publicly available?
We’d need to be able to reproduce the issue to understand what is the problem.

1 Like

Unfortunately I can’t share the code because it’s a private project. The only information I can share is that we use stackless Python 2.7 and the code is analysed using TeamCity. I tried to run the scanner with -X option and this is what I saw:

|[15:39:36] :| [Step 3/3] 15:39:36.117 DEBUG: Not enough content in '<SOME_PATH>__init__.py to have CPD blocks, it will not be part of the duplication detection|
|---|---|
|[04:37:30] :| [Step 3/3] 04:37:30.409 DEBUG: Not enough content in '<SOME_PATH>__init__.py' to have CPD blocks, it will not be part of the duplication detection|

So there was no debug output from 15:39:36 to 04:37:30 I’m not sure if the sensor was stuck or something like this

2 Likes

I understand, thanks for sharing those information.

That’s quite strange: in our integration tests, we analyse also big projects like Ansible (~1.4 Million lines of codes) and analysis time is around 10 minutes.

I can suggest the following: we’ll soon release new version of Python analyzer (v2.5) that will add progress report of the analysis in the scanner logs and information about the file we’re analyzing. Hopefully that would help narrow down the problem.

This release will be available next week in SonarCloud.

6 Likes

Hi @roman_p,
SonarPython 2.5 is now available on SonarCloud with ‘progress report’ enabled.

Could you please analyse your project again and check in the logs if analysis get stuck on a particular file?

Thank you!

Hi @Andrea_Guarino ,

Now I can locate the file that causes this problem. It’s exactly one __init__.py file that has about 3k lines of code. If I exclude this file from the analysis then it takes about 1h30m to analyse everything. Unfortunately I can’t share the code so please let me know if you have any thoughts on this or if I you need any more information from me

@Andrea_Guarino ,

I might have more information regarding this issue. If you try to analyze this repo then you’ll see that it takes a lot of time to analyze this file . This is kind of similar to the situation that we have. Hope this helps

@roman_p ,

Thank you for the information. I’m able to reproduce on docutils repository indeed.
I’m investigating the problem and I will keep you updated about it.

I identified a potential problem with the rule S125 “Sections of code should not be commented out”.

For the time being, as a workaround, you may try disabling this rule (you can have a look here to check how to disable a ‘SonarWay’ rule) and try restarting the analysis.

Hi @roman_p,
SonarPython 2.6 is now available in SonarCloud: it should fix the performance problem you had.
Could you please confirm?

Thanks!