Java "Main" source files AST scan takes more than 1 hour to run for 3K Java files

We are using SonarCloud and our code base has 300K Java and 300K JS line of code.
The java code is written on Java8 but sonar scan is on Java11.

  • GitHub
  • Jenkins on ECS fargate - We have already tried increasing the memory to its limits 30GB.
  • Script to run PR Scanner
mvn org.sonarsource.scanner.maven:sonar-maven-plugin:3.9.1.2184:sonar -Dsonar.projectKey=name
-Dsonar.pullrequest.key=$PR_ID -Dsonar.pullrequest.branch=$1
-Dsonar.verbose=true
-Dsonar.pullrequest.base=master -Dsonar.pullrequest.provider=GitHub
  • Java and JS

Hello @prasoon124,

Happy New Year 2022
Welcome to the SonarSource Community :wave:. i hope you’ll enjoy it.
First of all I can say that “more than 1 hour” (I would need a timing more accurate to come with a final statement) for 3000 files and 600K LoC is rather slow wrt to our scanner typical speed but not completely outside expectations. It’s quite a lot of code…
To have a more accurate diagnosis I would need:

  • Your Jenkinsfile to understand how you set up your pipeline
  • The full logs of your maven scanner execution, in DEBUG mode. Can you attach to this thread the Jenkins console log your last full pipeline run (The build part AND the scan part). I see it’s already ran in DEBUG thanks to the -Dsonar.verbose=true

Can you attach the 2 above files to the thread.

Thanks, Olivier

Hello @prasoon124,

Any update on this? For me to investigate, can you send me the requested information?

Olivier

Apologies for the late response, the logs are only for our JAVA backend which we are currently running and which takes time…

Thanks,
Prasoon

Hello @prasoon124,

I looked at all the data you sent and indeed the SonarCloud scan takes approximately 1h12mn for 3501 files. This is an average of 1230 ms (1"2) per file. This is indeed quite a lot.
I could not find any obvious reason for this slow processing. Immediate though are:

  • “What are the hardware specs of the agent you run the scan on?” and
  • “Is the scanner is the only process to run on that agent?” (I am asking because from your Jenkinsfile you seem to do several things like running Postgres and Elastic)
  • “Some source files are really big and, as a consequence, take a long time to analyze”. For instance file backend/src/main/java/io/avidsecure/policy/AWSRuleExecutor.java takes 16s to analyze, but is also 290KB. That probably translate into something like 10K LoCs.
    This looks really big, and also partly explain why it’s so long. It’s 3501 files in total, but 3501 big files in average. Can you check how many LoCs are reported into in SonarCloud ? That’s probably a better measurement of your project size than the number of files.

What can be the way forward ?

  • Use an agent with a faster clock speed: Currently the Java analyzer is mono-threaded, the best way to get better performance is to run on machines with faster CPU clock speed.
  • Boost the Maven heap allocation, if not already done (I suspect this may already be the case from some logs I saw). This may improve analysis speed. I would suggest to allocate 4 GB to Maven to try. This may be oversize, you may tune that down later, but let’s see if that helps. (Increase heap by setting MAVEN_OPTS="-Xmx4G")
  • Be patient for a couple of months (maybe weeks): We should have important changes related to speed of analysis hopefully soon. See:
    – Faster PR analysis on Productboard
    – Faster Java analysis on Productboard

Olivier

Hello @OlivierK ,
Thanks for a quick response.

Will check with the infra team and get back to you on your questions.
Will also try these changes you mentioned…

Give me few days.

Regards,
Prasoon