Our SonarQubeAnalyze has started failing in our Azure DevOps pipeline.
We are running SonarQube 9.9 (LTS/LTA), deployed via a Zip on an Azure VM, the backend Database is an Azure SQL Server database.
As SonarQube doesn’t have native HTTPS support, there is an IIS Proxy in front of SQ which is taking care of the HTTPS.
During the SonarQubeAnalyze@6 task, I can see the step “Load project repositories” is executing, after 1 minute, this step times out, and the task fails.
In the SonarQube sonar.properties file, I saw a setting named sonar.web.http.keepAliveTimeout which was defaulting to 60 seconds, I thought increasing this value and restarting the service would fix my problem. It did not, “Load project repositories” still times out and the SonarQubeAnalyze step fails.
Digging though documentation and other users posts I found this page which references a setting named sonar.ws.timeout, however it isn’t very explicit about where this goes or how it is set. I have tried setting it in the SonarQube sonar.properties but this made no difference. It also mentions about adding these settings in the UI, but I see no place for adding this variable as they mention. I had also tried adding extraProperties: 'sonar.ws.timeout=120' to the SonarQubeAnalyze@6 inputs, but again this made no difference.
I have already ensured the IIS Proxy has times set to much longer than 60 seconds, currently IIS timeout is set to 240 seconds.
The analysis / scanner log is what’s output from the analysis command. Hopefully, the log you provide - redacted as necessary - will include that command as well.
There is definitely a timeout occurring here, but I do not know why it is taking so long, but even if it is taking time how do I alter SonarQubeAnalyze’s timeout?
The pipeline is intermittently working, on the rare occasions it does execute I can see in the Azure logs the request (that is normally failing) is only taking 5633ms (5.6 seconds). When this request takes too long, the SonarQubeAnalyze stage fails. When I hit the URL manually myself in my browser with dev tools enabled, I can see I get a response normally around 6 seconds.
In the attached logs, you can see SonarQubeAnalyze gives up after 1 minute and times out when doing a get on /batch/project.protobuf. I can see from the IIS logs that for some reason SQ took 130794ms (131 seconds) to respond, by which point SonarQubeAnalyze in the pipeline had already timed out and failed.
SQ is running on a pretty well specced Azure VM [Standard A4m v2 (4 vcpus, 32 GiB memory)], and had a 10DTU Azure SQL Server Database, which I recently bumped up to 20DTU to see if that would resolve the problem, but the behaviour remains the same.
So… there are some analysis parameters related to timeouts that may be helpful. But the first thing I would do here is check the SonarQube access log ($SONARQUBE-HOME/logs/access.log) to see how long SonarQube is taking to serve that request. The last number on the line is the time in milliseconds.
What I suspect is that you have something “helpful” on the network slowing this down.
We are running SonarQube 9.9 LTA Server, which only has the sonar.ws.timeout=300 option listed in the Analysis parameters documentation, which I had already tried setting as an extraProperties as per my main post, so sadly it looks like the sonar.scanner.socketTimeout option isn’t available, so SonarQubeAnalyze@6 still times out and fails at 60 seconds after making the request.
Looking at logs for a recent failure (always seems to be in the same place during Analysis), in the SQ the access log, I cannot see a value for the time it took as you mentioned, but looking at other lines from the access log it appears the time value for this request specific request was recorded as blank/dash. IIS logs are in UTC, SQ logs are in UTC+12.
I’m not familiar with IIS logs. You’ll have to tell me which number is the time to serve the request. Or… you could see if this works when you bypass IIS…
I’ve just spun up a copy of 9.9 and run an analysis. You’re right. No duration is recorded in access.log for serving that /batch/project.protobuf request. So I re-ran the analysis with the debug flag & got timings on the analysis side. This is really an apples-to-oranges comparison, but my time was 105ms for that request.
Unfortunately our network team won’t help here, as this SQ instance is managed by my dev team since it’s hosted in an isolated Azure environment, and just for usage with DevOps build pipelines. Which adds to the confusion, there is no layer or enterprise NOC software running within the SQ VPC which could be causing performance issues.
The IIS log shows the time as the last column, so in the instance when the request timed out, IIS shows (in ms) that it took 114 seconds to get a response from SQ. This goes above the 60 second SQ Analysis timeout value, so fails. All other responses from SQ are happening in a timely manner, it is only this one request which is intermittently showing this behaviour.
I might try enabling a pipeline retry on the analysis stage so if the analysis does timeout, hopefully on a second parse it will succeed.
My other option might be to upgrade out SQ to 10.X instead of being on 9.9LTA, if that will give me the added options regarding timeouts in the analysis step not available in 9.9?
Setting the extraProperties sonar.ws.timeout=300 didn’t resolve my problem, and as the property sonar.scanner.socketTimeout isn’t available in SQ 9.9 LTA my pipeline was still intermittently failing on the SonarQubeAnalyze@6 task, specifically on a /batch/project.protobuf GET when the logs state Load project repositories. That request would take longer than 60 seconds and the whole SonarQubeAnalyze@6 task would fail due to timeout.
All other requests to our SQ server are responding timely and as expected, it is just this one request that SQ is intermittently taking over 2 minutes to complete (compared to the normal few seconds) which was causing our whole pipeline to fail. I’m not sure what SQ is doing internally for this one request to be so erratic, but I managed to find a (sub optimal, but acceptable) solution.
Solution
As I cannot increase the timeout before SQ gives up on a GET /batch/project.protobuf request during the SonarQubeAnalyze@6 task, I have put a retry on the whole SonarQubeAnalyze@6 task. I have now witnessed the SonarQubeAnalyze@6 task fail, and then retry and complete successfully.
Analyze with Retry YAML:
- task: SonarQubeAnalyze@6
# There is a transient failure on a step inside SonarQubeAnalyze
# where our SQ takes too long to respond causing task failure.
# SQ 9.9 doesn't accept the sonar.scanner.socketTimeout property
# so times out at 60 seconds, when the step takes about 130 seconds
# to complete (sometimes). See our SQ community post for further details:
# https://community.sonarsource.com/t/azure-sonarqubeanalyze-timeout
retryCountOnTaskFailure: 5
inputs:
jdkversion: 'JAVA_HOME_17_X64'
extraProperties: |
sonar.scanner.socketTimeout=300
sonar.ws.timeout=300
Note: I have left the unsupported sonar.scanner.socketTimeout property in the task, hoping that when the next LTA version is released that this property will start functioning as expected and the requests will stop timing out, thus retries will not be needed.