Background task takes between 2h and 18h on pull request decoration, then fails

System

  • Developer Edition Version 8.5.1 (build 38104)
  • Connected with on premise Gitlab
  • Using pull request decoration

Related
Possibly related but not quite the same as:

Problem
We’ve seen some background tasks (not all of them) that can take up to 18h to complete. I’ve enabled debug logging to see what is happening.

2020.11.04 09:24:06 INFO  ce[][o.s.c.t.CeWorkerImpl] Execute task | project=a:gui | type=REPORT | pullRequest=1025 | id=AXWOuxPwPeb1LWvbqRoZ | submitter=gitlabanalysis
2020.11.04 09:24:06 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][o.s.c.t.p.s.ExtractReportStep] Analysis report is 941 KB uncompressed
<<<<<Lots of SUCCESS steps>>>>
2020.11.04 09:24:07 INFO  ce[AXWOuxPwPeb1LWvbqRoZ][o.s.c.t.s.ComputationStepExecutor] Trigger refresh of Portfolios and Applications | status=SUCCESS | time=0ms
2020.11.04 09:24:07 INFO  ce[AXWOuxPwPeb1LWvbqRoZ][o.s.c.t.p.a.p.PostProjectAnalysisTasksExecutor] Webhooks | globalWebhooks=0 | projectWebhooks=0 | status=SUCCESS | time=2ms
2020.11.04 09:24:07 WARN  ce[AXWOuxPwPeb1LWvbqRoZ][c.s.C.D.D.I] GitLab's project ID is now mandatory in the SonarQube project settings.
2020.11.04 09:24:07 WARN  ce[AXWOuxPwPeb1LWvbqRoZ][c.s.C.D.D.I] GitLab url is now mandatory in the SonarQube administration ALM settings.
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][GitlabPrHttpClient] loading notes from [https://somegitlab.net/api/v4/projects/41/merge_requests/1025/notes]
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][GitlabPrHttpClient] get notes : [https://somegitlab.net/api/v4/projects/41/merge_requests/1025/notes]
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][o.i.http2.Http2] >> CONNECTION 505249202a20485454502f322e300d0a0d0a534d0d0a0d0a
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][o.i.http2.Http2] >> 0x00000000     6 SETTINGS      
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][o.i.http2.Http2] >> 0x00000000     4 WINDOW_UPDATE 
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][o.i.http2.Http2] >> 0x00000003    93 HEADERS       END_STREAM|END_HEADERS
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] << 0x00000000    18 SETTINGS      
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] << 0x00000000     4 WINDOW_UPDATE 
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] << 0x00000000     0 SETTINGS      ACK
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] >> 0x00000000     0 SETTINGS      ACK
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] << 0x00000003   644 HEADERS       END_HEADERS
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] << 0x00000003  1642 DATA          END_STREAM
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][GitlabPrHttpClient] get current user : [https://somegitlab.net/api/v4/user]
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][o.i.http2.Http2] >> 0x00000005    17 HEADERS       END_STREAM|END_HEADERS
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] << 0x00000005   309 HEADERS       END_HEADERS
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] << 0x00000005   477 DATA          END_STREAM
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][GitlabPrHttpClient] post note : [https://somegitlab.net/api/v4/projects/41/merge_requests/1025/notes?body=<<NOTEBODY>>]
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][o.i.http2.Http2] >> 0x00000007  4179 HEADERS       END_HEADERS
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][o.i.http2.Http2] >> 0x00000007     0 DATA          END_STREAM
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] << 0x00000000     8 GOAWAY        
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] >> 0x00000007     4 RST_STREAM    
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][o.i.http2.Http2] >> CONNECTION 505249202a20485454502f322e300d0a0d0a534d0d0a0d0a
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][o.i.http2.Http2] >> 0x00000000     6 SETTINGS      
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][o.i.http2.Http2] >> 0x00000000     4 WINDOW_UPDATE 
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][o.i.http2.Http2] >> 0x00000003  4232 HEADERS       END_HEADERS
2020.11.04 09:24:07 DEBUG ce[AXWOuxPwPeb1LWvbqRoZ][o.i.http2.Http2] >> 0x00000003     0 DATA          END_STREAM
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] << 0x00000000    18 SETTINGS      
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] << 0x00000000     4 WINDOW_UPDATE 
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] << 0x00000000     0 SETTINGS      ACK
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] >> 0x00000000     0 SETTINGS      ACK
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] << 0x00000000     8 GOAWAY        
2020.11.04 09:24:07 DEBUG ce[][o.i.http2.Http2] >> 0x00000003     4 RST_STREAM

The last 11 lines are then repeated for 160x per second! In this case after 2h41m it completed with the following error:

2020.11.04 12:05:08 ERROR ce[AXWOuxPwPeb1LWvbqRoZ][c.s.C.D.D.B] An exception was thrown during Merge Request decoration : Connection reset by peer (connect failed)
2020.11.04 12:05:08 ERROR ce[AXWOuxPwPeb1LWvbqRoZ][o.s.c.t.p.a.p.PostProjectAnalysisTasksExecutor] Execution of task class com.sonarsource.C.D.c failed
java.lang.IllegalStateException: Connection reset by peer (connect failed)
	at com.sonarsource.C.D.D.G.A(Unknown Source)
	at com.sonarsource.C.D.D.B.A(Unknown Source)
	at com.sonarsource.C.D.c.A(Unknown Source)
	at java.base/java.util.Optional.ifPresent(Optional.java:183)
	at com.sonarsource.C.D.c.B(Unknown Source)
	at com.sonarsource.C.D.c.A(Unknown Source)
	at org.sonar.ce.async.SynchronousAsyncExecution.addToQueue(SynchronousAsyncExecution.java:27)
	at com.sonarsource.C.D.c.A(Unknown Source)
	at java.base/java.util.Optional.ifPresent(Optional.java:183)
	at com.sonarsource.C.D.c.finished(Unknown Source)
	at org.sonar.ce.task.projectanalysis.api.posttask.PostProjectAnalysisTasksExecutor.executeTask(PostProjectAnalysisTasksExecutor.java:118)
	at org.sonar.ce.task.projectanalysis.api.posttask.PostProjectAnalysisTasksExecutor.finished(PostProjectAnalysisTasksExecutor.java:109)
	at org.sonar.ce.task.step.ComputationStepExecutor.executeListener(ComputationStepExecutor.java:91)
	at org.sonar.ce.task.step.ComputationStepExecutor.execute(ComputationStepExecutor.java:63)
	at org.sonar.ce.task.projectanalysis.taskprocessor.ReportTaskProcessor.process(ReportTaskProcessor.java:81)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.executeTask(CeWorkerImpl.java:235)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.run(CeWorkerImpl.java:217)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.findAndProcessTask(CeWorkerImpl.java:162)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$TrackRunningState.get(CeWorkerImpl.java:137)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:89)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:53)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.net.SocketException: Connection reset by peer (connect failed)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
	at java.base/java.net.Socket.connect(Socket.java:609)
	at okhttp3.internal.platform.Platform.connectSocket(Platform.java:130)
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:263)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:183)
	at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224)
	at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108)
	at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88)
	at okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229)
	at okhttp3.RealCall.execute(RealCall.java:81)
	... 30 common frames omitted
2020.11.04 12:05:08 INFO  ce[AXWOuxPwPeb1LWvbqRoZ][o.s.c.t.p.a.p.PostProjectAnalysisTasksExecutor] Pull Request decoration | status=FAILED | time=9660879ms
2020.11.04 12:05:08 INFO  ce[AXWOuxPwPeb1LWvbqRoZ][o.s.c.t.CeWorkerImpl] Executed task | project=a:gui | type=REPORT | pullRequest=1025 | id=AXWOuxPwPeb1LWvbqRoZ | submitter=gitlabanalysis | status=SUCCESS | time=9662212ms

And on another occasion which ended after 16h54m it ended with:

2020.11.04 09:24:06 ERROR ce[AXWOunmsPeb1LWvbqRoX][c.s.C.D.D.B] An exception was thrown during Merge Request decoration : Couldn't kickstart handshaking
2020.11.04 09:24:06 ERROR ce[AXWOunmsPeb1LWvbqRoX][o.s.c.t.p.a.p.PostProjectAnalysisTasksExecutor] Execution of task class com.sonarsource.C.D.c failed
java.lang.IllegalStateException: Couldn't kickstart handshaking
	at com.sonarsource.C.D.D.G.A(Unknown Source)
	at com.sonarsource.C.D.D.B.A(Unknown Source)
	at com.sonarsource.C.D.c.A(Unknown Source)
	at java.base/java.util.Optional.ifPresent(Optional.java:183)
	at com.sonarsource.C.D.c.B(Unknown Source)
	at com.sonarsource.C.D.c.A(Unknown Source)
	at org.sonar.ce.async.SynchronousAsyncExecution.addToQueue(SynchronousAsyncExecution.java:27)
	at com.sonarsource.C.D.c.A(Unknown Source)
	at java.base/java.util.Optional.ifPresent(Optional.java:183)
	at com.sonarsource.C.D.c.finished(Unknown Source)
	at org.sonar.ce.task.projectanalysis.api.posttask.PostProjectAnalysisTasksExecutor.executeTask(PostProjectAnalysisTasksExecutor.java:118)
	at org.sonar.ce.task.projectanalysis.api.posttask.PostProjectAnalysisTasksExecutor.finished(PostProjectAnalysisTasksExecutor.java:109)
	at org.sonar.ce.task.step.ComputationStepExecutor.executeListener(ComputationStepExecutor.java:91)
	at org.sonar.ce.task.step.ComputationStepExecutor.execute(ComputationStepExecutor.java:63)
	at org.sonar.ce.task.projectanalysis.taskprocessor.ReportTaskProcessor.process(ReportTaskProcessor.java:81)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.executeTask(CeWorkerImpl.java:235)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.run(CeWorkerImpl.java:217)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.findAndProcessTask(CeWorkerImpl.java:162)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$TrackRunningState.get(CeWorkerImpl.java:137)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:89)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:53)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: javax.net.ssl.SSLException: Couldn't kickstart handshaking
	at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:326)
	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:269)
	at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:450)
	at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:411)
	at okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:336)
	at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:300)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:185)
	at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224)
	at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108)
	at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88)
	at okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229)
	at okhttp3.RealCall.execute(RealCall.java:81)
	... 30 common frames omitted
	Suppressed: java.net.SocketException: Broken pipe (Write failed)
		at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
		at java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
		at java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:150)
		at java.base/sun.security.ssl.SSLSocketOutputRecord.encodeAlert(SSLSocketOutputRecord.java:81)
		at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:357)
		... 53 common frames omitted
Caused by: java.net.SocketException: Broken pipe (Write failed)
	at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
	at java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:150)
	at java.base/sun.security.ssl.SSLSocketOutputRecord.flush(SSLSocketOutputRecord.java:251)
	at java.base/sun.security.ssl.HandshakeOutStream.flush(HandshakeOutStream.java:89)
	at java.base/sun.security.ssl.ClientHello$ClientHelloKickstartProducer.produce(ClientHello.java:658)
	at java.base/sun.security.ssl.SSLHandshake.kickstart(SSLHandshake.java:525)
	at java.base/sun.security.ssl.ClientHandshakeContext.kickstart(ClientHandshakeContext.java:107)
	at java.base/sun.security.ssl.TransportContext.kickstart(TransportContext.java:233)
	at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:433)
	... 51 common frames omitted

When
Not too sure if it’s the cause but it seems to happen on merge request with 25+ commits.

Possible solution
Add a timeout after 1 or 2 minuten to the pull decoration http-call.

Hi,

Could you elaborate on why you think this is a different issue from SONAR-13290?

 
Thx,
Ann

Hi Ann,

Thanks for your response, two reasons why this might be different

  1. 13290 states: "background task is looping forever on the MR decoration, without ever failing". In our case the background task does complete in the end (successfully)
  2. Repeating the post request via curl from the logs does not result in a 414 Request-URI Too Large as stated in the issue. The workaround mentioned does not fix this problem. I have to add 3400 characters to the URI to get a 414 statuscode.

Kind regards,
Jeffrey

Hi Jeffrey,

Thanks for getting back to me. I’ve routed this internally.

 
Ann

Hi, is there a proxy between SonarQube and GitLab? Could you try the curl command from the SonarQube server? (and last question, which GitLab version are you using?)

Hi Pierre,

Currently using Gitlab CE 13.5.3. There is no proxy between the two. Same result from the SonarQube server (Access Denied because I don’t sent credentials along).

We are running Sonar in Kubernetes using the official docker image.

Normally the first view notes to the MR will succeed. After building up the MR with more commits this starts happening at some point.

Kind regards,
Jeffrey

The most probable issue is an unstable connection between SQ and GitLab. I’m not a K8 expert, so I can’t really help you to dig on that side.

@pierreguillot I understand, but even if it’s due to an unstable connection, shouldn’t there be some sort of timeout or error-handling on Sonar side? As there can be only 1 active job this bug blocks all pending jobs for days.

That’s a fair point, and actually, there is a timeout mechanism. And to be honest, I have no idea why it’s not active in your case! I tried to reproduce this locally both on gitlab.com and with an on-prem gitlab, no luck: the timeout is triggered as expected.

Would you be able to capture a memory heap dump of the compute engine Java process when a background task is stuck?

Yes I’ll try and do that (have to wait for the issue to arise).
Do you have any instructions for getting the memory dump? I’ve done it in the past but that’s a while ago :wink:

btw, not 100% sure yet but I’ve seen alerts on cpu pressure on the on-prem gitlab machine during the time of the initial post.

Sorry I meant a thread dump, not a memory dump. I’ll give you an example later today yes :slight_smile:

So you can run jps to identify the java process id:

➜  jps
13153 Jps
11649 CeServer
11284 WrapperSimpleApp
9799 GradleDaemon
8617 GradleDaemon
11323 Elasticsearch
11500 WebServer
366 
367 

And you want a dump from the CeServer PID, so in my case:
jstack -l 11649 > dump.txt