504 Gateway Time-out errors after Sonarqube 9.9.4 LTS upgrade

Hello,

I was using Sonarqube 8.9.6 in my environment. Checkstyle plugin is 8.40.

I decided to switch to LTS version and upgraded Sonarqube 9.9.4, Checkstyle plugin 10.14.1.

When my system is under load,

https://sonar-url/api/plugins/download?plugin=checkstyle
receives 504 Gateway Time-out error many times.

and I see “Failed to query ES status” logs in my Sonarqube application logs. I host my application in my Kubernetes/Openshift environment.
The pod has enough memory.

Has anyone experienced this situation before?

2024.08.09 08:27:36 WARN  es[][o.e.t.TransportService] Received response for a request that has timed out, sent [1.8m/110852ms] ago, timed out [1.5m/95846ms] ago, action [cluster:monitor/nodes/stats[n]], node [{sonarqube}{GYa_pre3SB20M9MxP1vQbg}{QNgwICVJTAi1I395gObW_w}{127.0.0.1}{127.0.0.1:34069}{cdfhimrsw}{rack_id=sonarqube}], id [600215]
2024.08.09 08:32:04 WARN  es[][o.e.m.f.FsHealthService] health check of [/opt/sonarqube/data/es7/nodes/0] took [26811ms] which is above the warn threshold of [5s]
2024.08.09 08:36:21 WARN  es[][o.e.c.InternalClusterInfoService] failed to retrieve stats for node [GYa_pre3SB20M9MxP1vQbg]: [sonarqube][127.0.0.1:34069][cluster:monitor/nodes/stats[n]] request_id [600785] timed out after [15007ms]
2024.08.09 08:36:41 WARN  es[][o.e.t.TransportService] Received response for a request that has timed out, sent [35.2s/35216ms] ago, timed out [20.2s/20209ms] ago, action [cluster:monitor/nodes/stats[n]], node [{sonarqube}{GYa_pre3SB20M9MxP1vQbg}{QNgwICVJTAi1I395gObW_w}{127.0.0.1}{127.0.0.1:34069}{cdfhimrsw}{rack_id=sonarqube}], id [600785]
2024.08.09 08:36:41 WARN  es[][o.e.m.f.FsHealthService] health check of [/opt/sonarqube/data/es7/nodes/0] took [37416ms] which is above the warn threshold of [5s]
2024.08.09 08:39:21 WARN  es[][o.e.m.f.FsHealthService] health check of [/opt/sonarqube/data/es7/nodes/0] took [40017ms] which is above the warn threshold of [5s]
2024.08.09 08:41:36 WARN  es[][o.e.c.InternalClusterInfoService] failed to retrieve stats for node [GYa_pre3SB20M9MxP1vQbg]: [sonarqube][127.0.0.1:34069][cluster:monitor/nodes/stats[n]] request_id [601075] timed out after [15006ms]
2024.08.09 08:42:22 WARN  es[][o.e.t.TransportService] Received response for a request that has timed out, sent [1m/60826ms] ago, timed out [45.8s/45820ms] ago, action [cluster:monitor/nodes/stats[n]], node [{sonarqube}{GYa_pre3SB20M9MxP1vQbg}{QNgwICVJTAi1I395gObW_w}{127.0.0.1}{127.0.0.1:34069}{cdfhimrsw}{rack_id=sonarqube}], id [601075]
2024.08.09 08:42:22 WARN  es[][o.e.m.f.FsHealthService] health check of [/opt/sonarqube/data/es7/nodes/0] took [60626ms] which is above the warn threshold of [5s]
2024.08.09 08:43:07 WARN  es[][o.e.c.InternalClusterInfoService] failed to retrieve stats for node [GYa_pre3SB20M9MxP1vQbg]: [sonarqube][127.0.0.1:34069][cluster:monitor/nodes/stats[n]] request_id [601155] timed out after [15007ms]
2024.08.09 08:43:46 WARN  es[][o.e.t.TransportService] Received response for a request that has timed out, sent [54.4s/54424ms] ago, timed out [39.4s/39417ms] ago, action [cluster:monitor/nodes/stats[n]], node [{sonarqube}{GYa_pre3SB20M9MxP1vQbg}{QNgwICVJTAi1I395gObW_w}{127.0.0.1}{127.0.0.1:34069}{cdfhimrsw}{rack_id=sonarqube}], id [601155]
2024.08.09 08:44:50 WARN  es[][o.e.m.f.FsHealthService] health check of [/opt/sonarqube/data/es7/nodes/0] took [28612ms] which is above the warn threshold of [5s]
2024.08.09 08:46:20 ERROR web[][o.s.s.m.ElasticSearchMetricTask] Failed to query ES status
org.sonar.server.es.ElasticsearchException: Fail to execute es request
	at org.sonar.server.es.EsClient.execute(EsClient.java:313)
	at org.sonar.server.es.EsClient.execute(EsClient.java:305)
	at org.sonar.server.es.EsClient.nodesStats(EsClient.java:216)
	at org.sonar.server.monitoring.ElasticSearchMetricTask.updateFileSystemMetrics(ElasticSearchMetricTask.java:77)
	at org.sonar.server.monitoring.ElasticSearchMetricTask.run(ElasticSearchMetricTask.java:51)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.runAndReset(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.net.SocketTimeoutException: 60,000 milliseconds timeout on connection http-outgoing-285 [ACTIVE]
	at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:917)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:300)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288)
	at org.sonar.server.es.EsClient.lambda$nodesStats$29(EsClient.java:218)
	at org.sonar.server.es.EsClient.execute(EsClient.java:311)
	... 10 common frames omitted
Caused by: java.net.SocketTimeoutException: 60,000 milliseconds timeout on connection http-outgoing-285 [ACTIVE]
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387)
	at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92)
	at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39)
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:502)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
	... 1 common frames omitted
2024.08.09 08:47:24 WARN  es[][o.e.m.f.FsHealthService] health check of [/opt/sonarqube/data/es7/nodes/0] took [33615ms] which is above the warn threshold of [5s]
2024.08.09 08:49:35 WARN  es[][o.e.m.f.FsHealthService] health check of [/opt/sonarqube/data/es7/nodes/0] took [10805ms] which is above the warn threshold of [5s]
2024.08.09 08:50:22 WARN  es[][o.e.c.InternalClusterInfoService] failed to retrieve stats for node [GYa_pre3SB20M9MxP1vQbg]: [sonarqube][127.0.0.1:34069][cluster:monitor/nodes/stats[n]] request_id [601551] timed out after [15007ms]
2024.08.09 08:50:44 WARN  es[][o.e.t.TransportService] Received response for a request that has timed out, sent [37.4s/37418ms] ago, timed out [22.4s/22411ms] ago, action [cluster:monitor/nodes/stats[n]], node [{sonarqube}{GYa_pre3SB20M9MxP1vQbg}{QNgwICVJTAi1I395gObW_w}{127.0.0.1}{127.0.0.1:34069}{cdfhimrsw}{rack_id=sonarqube}], id [601551]

When I check the sonarqube_access.log, it doesn’t show any HTTP 504.

I don’t understand whether the sonarqube_access log document does not record the HTTP 504 code or whether my requests that receive HTTP 504 never reach the server.

Maybe I’m stuck at another layer. LoadBlancing?

There is no important log in the sonarqube_app log file.,
There is no important log in the sonarqube_ce log file.
There is only WARN in the sonarqube_es log file.
All timeouts in the sonarqube_web topic are here

But is this timeout error for me? Is it because the request I sent fell to HTTP 504? Or is sonarqube itself unable to reach elastic?

image

Hi,

This. You should check your proxy.

Also, based on the Plugin Version Matrix you should upgrade Checkstyle to 10.17.0.

 
HTH,
Ann

Hi @turkcankeskin.

Can you please clarify this point?
Are you seeing a high CPU, memory, or network overload? Do you have any numbers? :slight_smile:
Did this start after the SonarQube upgrade? What edition were you using? What edition are you using now?

Are you using Kubernetes or Openshift?

Hi,

I have been looking into this issue with my cloud admin for a week. Thanks for your support.

I am using an Openshift environment and the issue seems to be due to the slowness of the TCP IP NFS disk connected to the Openshift environment. I am working on resolving this. I will get back to you when my problem is completely resolved.

@davi.vidal
No, there is no high cpu or ram.

@ganncamp
Thanks for the matrix. I have this upgrade on my roadmap.

2 Likes