OutOfMemoryError since Quality Profile was updated

Hi,

we are using SonarQube Server 10.7.
Since we updated (extended) our quality profile, we get OutOfMemoryErrors when analyzing a merge request in a project with ~1,1 million lines. Before the update it worked with -Xmx4g, now the problem even sometimes occurs with -Xmx6g.

[...]
[INFO] Taint analysis for java: Starting
[INFO] 0 / 21709 UCFGs simulated, memory usage: 3015 MB
[INFO] 867 / 21709 UCFGs simulated, memory usage: 3139 MB
[INFO] 1509 / 21709 UCFGs simulated, memory usage: 2840 MB
[INFO] 2366 / 21709 UCFGs simulated, memory usage: 3068 MB
[INFO] 3471 / 21709 UCFGs simulated, memory usage: 2685 MB
[INFO] 4524 / 21709 UCFGs simulated, memory usage: 2652 MB
[INFO] 5586 / 21709 UCFGs simulated, memory usage: 3102 MB
[INFO] Too high simulation costs for sink in C:/Runners/gitlab-builds/09/my/custom/path/ECPooledRelationElement.java:3214. This sink will not be analyzed any further.
[INFO] Too high simulation costs for sink in C:/Runners/gitlab-builds/09/my/custom/path/ECPooledRelationElement.java:3168. This sink will not be analyzed any further.
[...]
several more similar lines
[...]
[INFO] Too high simulation costs for sink in C:/Runners/gitlab-builds/09/my/custom/path/ModelInstanceReference.java:1097. This sink will not be analyzed any further.
[INFO] 9217 / 21709 UCFGs simulated, memory usage: 4935 MB
[INFO] 10489 / 21709 UCFGs simulated, memory usage: 4752 MB
[...]
[INFO] 11465 / 21709 UCFGs simulated, memory usage: 5025 MB
[INFO] Too high simulation costs for sink in C:/Runners/gitlab-builds/09/my/custom/path/ECRelationElement.java:621. This sink will not be analyzed any further.
[INFO] Too high simulation costs for sink in C:/Runners/gitlab-builds/09/my/custom/path/ECPooledModelConcept.java:1764. This sink will not be analyzed any further.
[INFO] Too high simulation costs for sink in C:/Runners/gitlab-builds/09/my/custom/path/PatternMatcherHelper.java:102. This sink will not be analyzed any further.
[INFO] 12323 / 21709 UCFGs simulated, memory usage: 5239 MB
[INFO] 13172 / 21709 UCFGs simulated, memory usage: 5657 MB
[INFO] Too high simulation costs for sink in C:/Runners/gitlab-builds/09/my/custom/path/AttributeAssistant.java:118. This sink will not be analyzed any further.
[...]
several more similar lines
[...]
[INFO] Too high simulation costs for sink in C:/Runners/gitlab-builds/09/my/custom/path/AbstractWidgetRenderer.java:74. This sink will not be analyzed any further.
[INFO] 13723 / 21709 UCFGs simulated, memory usage: 5901 MB
[INFO] Too high simulation costs for sink in C:/Runners/gitlab-builds/09/my/custom/path/TranslationBroker.java:68. This sink will not be analyzed any further.
[...]
[INFO] 14429 / 21709 UCFGs simulated, memory usage: 6090 MB
[INFO] Too high simulation costs for sink in C:/Runners/gitlab-builds/09/my/custom/path/SimpleDatabase.java:962. This sink will not be analyzed any further.
[INFO] 15309 / 21709 UCFGs simulated, memory usage: 6098 MB
[INFO] Too high simulation costs for sink in C:/Runners/gitlab-builds/09/my/custom/path/FileCreator.java:665. This sink will not be analyzed any further.
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid42660.hprof ...
[INFO] Time spent writing ucfgs 0ms
Heap dump file created [10323076039 bytes in 38.129 secs]
[...]

As you can see the memory usage increases, until the limit is reached.
There seems to be a mechanism to prevent this by aborting analyses which would be too costly, but this doesn’t help here.

The rules which were added by the rule set update are the following. One or more of them seem to cause these problems.

ID Rule
S5147 NoSQL operations should not be vulnerable to injection attacks
S5496 Server-side templates should not be vulnerable to injection attacks
S5883 OS commands should not be vulnerable to argument injection attacks
S6096 Extracting archives should not lead to zip slip vulnerabilities
S6173 Reflection should not be vulnerable to injection attacks
S6287 Applications should not create session cookies from untrusted input
S6350 Constructing arguments of system commands from user input is security-sensitive
S6384 Components should not be vulnerable to intent redirection
S6390 Thread suspensions should not be vulnerable to Denial of Service attacks
S6398 JSON operations should not be vulnerable to injection attacks
S6399 XML operations should not be vulnerable to injection attacks
S6547 Environment variables should not be defined from untrusted input
S6549 Accessing files should not lead to filesystem oracle attacks
S7044 Server-side requests should not be vulnerable to traversing attacks

Does anyone have an idea on how to solve this?
Are these rules expected to need much more memory?
Why is the OOM not prevented by the mentioned mechanism?

Thanks!

Regards,
Carsten

Hey Carsten.

Yes, all these rules are advanced vulnerability detection rules that use a fairly significant amount of memory compared to “normal” rules…

You can continue to increase the amount of memory given to the scanner – but we are also regularly improving our security analysis to be more efficient/performant. Some of those changes are coming in the upcoming LTA release of SonarQube Server.

You can learn more about what this mechanism does here.

It’s not something that adjusts based on how much memory is given to the analysis. It basically makes sure that the analysis will not, as @Malte says, explode exponentially. I guess infinity doesn’t matter when you’re memory limited (and analysis will just crash) – but I’m not even sure it’s the high simulation costs that are exhausting the memory available.

I think I’ve covered this topic well, but I’ll ping our experts to see if there’s anything to investigate further. They will probably be more interested to see what memory utilization looks like after you’re using the version packed with our next LTA.

1 Like

Hey @Carsten_HB ,

Thank you for reaching out! @Colin did a great job at explaining what is happening. There is indeed no mechanism to make the analysis “adjust” to the amount of memory given to it (in the spirit of “the less memory, the less extensive the analysis”). There is only a mechanism to ensure that there is no exponential explosion.

In my experience, 6 GB for a > 1 MLOC codebase can be a bit low. As stated by Colin, the rules you added to your QP require an extensive and complex analysis of the codebase. For such a large codebase, this can require an amount of memory that is higher than that.

My expectation is that with a higher amount of memory, the analysis should complete fine. I cannot give you an exact number for the memory needed depending on the size of the codebase, as there are other factors beyond the size of the codebase that will influence this (e.g., the amount of user input processed in the code).

Could you try (for testing) to set it to something extremely high, say, 24 GB ? To be clear, I do not expect that it should use that much memory, but if you can manage to get a “passing run” by giving the scanner enough memory as well as a “failing run” (with 6 GB), then you can perform a sort of “binary search” to narrow down the amount of memory actually needed for the analysis. It will also confirm that the problem is indeed that the analysis requires a lot of memory (as opposed to, say, a memleak, where the analysis does not complete regardless of how much memory you give it).

1 Like