Sonarqube Developer - intermittant issues connecting to Github Enterprise

Darragh_Grace · November 25, 2019, 3:26pm

Must-share information (formatted with Markdown):

which versions are you using (SonarQube, Scanner, Plugin, and any relevant extension)
Sonarqube Developer 8.0 (Dockerised)
what are you trying to achieve
Pull Request Analysis
what have you tried so far to achieve this
We have a Dockerised Sonarqube Developer running on our cloud infrastructure. We are using it to analyse branches for Pull Requests. We followed all the documentation and successfully got it working. We are seeing however intermittent failures with it when connecting to Github Enterprise.
Stacktrace below. To fix the issue we need to restart Sonarqube. It will work again for a while before the issue happens again. Do you have any idea why this might be happening or what I can do to fix it? It is proving a blocker for us.

Stacktrace
2019.11.25 06:32:41 ERROR ce[AW6hQsjq_1xs6e7rrbIN][c.s.C.A.B.A.A.C] Failed to request /repos/Ed-Assessment/react-client-common/installation
java.net.SocketTimeoutException: connect timed out
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:591)
at okhttp3.internal.platform.Platform.connectSocket(Platform.java:130)
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:263)
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:183)
at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224)
at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108)
at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88)
at okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:221)
at okhttp3.RealCall.execute(RealCall.java:81)
at com.sonarsource.C.A.B.A.A.D.B(Unknown Source)
at com.sonarsource.C.A.B.A.A.C.A(Unknown Source)
at com.sonarsource.C.A.B.A.A.C.A(Unknown Source)
at com.sonarsource.C.C.C.B.B(Unknown Source)
at com.sonarsource.C.C.C.B.A(Unknown Source)
at com.sonarsource.C.C.C.J.A(Unknown Source)
at com.sonarsource.C.C.Z.A(Unknown Source)
at java.base/java.util.Optional.ifPresent(Optional.java:183)
at com.sonarsource.C.C.Z.B(Unknown Source)
at com.sonarsource.C.C.Z.A(Unknown Source)
at org.sonar.ce.async.SynchronousAsyncExecution.addToQueue(SynchronousAsyncExecution.java:27)
at com.sonarsource.C.C.Z.A(Unknown Source)
at java.base/java.util.Optional.ifPresent(Optional.java:183)
at com.sonarsource.C.C.Z.finished(Unknown Source)
at org.sonar.ce.task.projectanalysis.api.posttask.PostProjectAnalysisTasksExecutor.executeTask(PostProjectAnalysisTasksExecutor.java:118)
at org.sonar.ce.task.projectanalysis.api.posttask.PostProjectAnalysisTasksExecutor.finished(PostProjectAnalysisTasksExecutor.java:109)
at org.sonar.ce.task.step.ComputationStepExecutor.executeListener(ComputationStepExecutor.java:91)
at org.sonar.ce.task.step.ComputationStepExecutor.execute(ComputationStepExecutor.java:63)
at org.sonar.ce.task.projectanalysis.taskprocessor.ReportTaskProcessor.process(ReportTaskProcessor.java:81)
at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.executeTask(CeWorkerImpl.java:209)
at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.run(CeWorkerImpl.java:191)
at org.sonar.ce.taskprocessor.CeWorkerImpl.findAndProcessTask(CeWorkerImpl.java:158)
at org.sonar.ce.taskprocessor.CeWorkerImpl$TrackRunningState.get(CeWorkerImpl.java:133)
at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:85)
at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:53)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

pierreguillot · November 26, 2019, 1:33pm

Welcome to the community!

Do you have a monitoring tool that check your github enterprise API for outage ?

While the connexion is broken, try to ssh inside the sonarqube container and try to fetch the api :

curl http(s)://[hostname]/api/v3

See if this is reachable from the container, and if not, from the docker host.

Darragh_Grace · November 26, 2019, 5:09pm

Thanks for getting back to me!

Yes, GHE is monitored and highly available, only outages should be scheduled maintenance windows etc.
Our Sonarqube instance is deployed to Apache Mesos, it will be difficult to ssh inside the container but I’ll check with our infrastructure teams if it is possible.
Is there some retry mechanism in Sonarqube for this situation? It might be worth considering as a restart of Sonarqube when we see this issue always solves it. It’s like the connection needs to be re-established…

pierreguillot · November 27, 2019, 10:37am

On the PR decoration implementation, we use okhttp3 with a singleton that hold a OkHttpClient, reused for all Github calls during the whole SonarQube instance life. This OkHttpClient is using an internal connexion pool in order to recycle http and https connections, keeping the underlying TCP socket open for next calls. This pool holds up to 5 idle connections which will be evicted after 5 minutes of inactivity.

With such mechanism, except an unlikely bug in the library we use, there should be no need for a SonarQube restart, event with connexion that timeout. If you happen to have an idle time of at least 5min without any PR decoration, this is a totally new http connexion that is created.

For further investigation, you should definitely try to reproduce the error at a lower technical level, and look directly on the 3rd layers of the TCP/IP stack, and check on the docker container and the docker host what happen with the TCP sockets. For example, do the host still have available tcp ports for the socket to open ? Do you encounter packet loss/drop ? Is the DNS resolution ok ?

To answer your last question, there is no retry mechanism in place in SonarQube, and i don’t feel we need one : connexions are already pooled, evicted and recreated if needed.

pierreguillot · November 27, 2019, 10:44am

an additional question : when you say you restart sonarqube, do you mean :

you restart it inside the container
you restart the container
you destroy/recreate the container

Darragh_Grace · November 27, 2019, 1:47pm

Thanks for the info Pierre. Looks like we have some debugging to do.

When I say restart, I mean we need to restart the app, not the container.

Darragh_Grace · December 5, 2019, 1:52pm

Hi Pierre, We are struggling to get to the bottom of this one. PR decoration in our Sonarqube developer is becoming unusable and I am unable to track down the issue. To solve we need to manually restart the application which is obviously not maintainable (it’s happening multiple times a day)
Could this issue be the result of a slow connnection to GHE by any chance? Below you can see some logs from today. We get a Warning in the Sonarqube UI saying Pull Request decoration failed. I will need to turn off our PR checks until I can get to the bottom of this (which is the whole reason we paid for sonarqube developer)
Attaching a selection of some of todays logs
sonarqube_logs.txt (89.3 KB)

Darragh_Grace · December 9, 2019, 9:00am

With Trace logs turned on
2019.12.09 08:53:56 TRACE ce[AW7p3SZUT9wdpmAyBsaA][sql] time=1ms | sql=insert into ce_task_message ( uuid, task_uuid, message, created_at ) values ( ?, ?, ?, ? ) | params=AW7p3XxGGGSfpB7jxKNP, AW7p3SZUT9wdpmAyBsaA, Pull request decoration did not happen. Please install SonarQube Github application on the repository’s organization or user., 1575881636934

Not sure why it would think the sonarqube Github applicxation is not installed.

If I restart Sonarqube from the System tab then it works again (for a while). Can someone help me out here? @pierreguillot

pierreguillot · December 9, 2019, 9:28am

Your logs show indeed that PR decoration do not happen, and fail after a 20s timeout on a call to Github Enterprise API.

The timeout is 20s, so by slow you mean a GH enterprise response time > 20s, yes it is your issue. Did you managed to curl the GH API from a container deployed in your cluster ?

Darragh_Grace · December 9, 2019, 10:09am

Hi Pierre, So if GHE did not respond in 20 secs, all further PR decorations do not work until I restart the Sonarqube server, there does not seem to be any recovering from it. If I restart sonarqube server it will work fine again for maybe a day or two. Is this expected behaviour?

pierreguillot · December 9, 2019, 10:52am

The behaviour is that every PR decoration try to call GHE to do the decoration, regardless of previous fail. I guess something is going wrong after xxx uptime, and all next connexions fails after that point. But SonarQube is strill trying to connect for every PR decoration.

If restarting SonarQube is fixing the issue, that may be because somehow SQ fail to reuse already opened connexion, of fail to acquire a new one. You need to drill down to the tcp stack and analyse what is happening on the container. Monitor outgoing connexion (with netstat or tcpdump). While under tcp monitoring, run on the container a curl to the GHE API, to see if it’s working when SQ fails to connect.

This will help us to narrow down your issue.

Darragh_Grace · January 7, 2020, 8:48am

Hi @pierreguillot We were able to dig some more in to this. We were able to Curl with a JWT from the appropriate private key to the app from the container, we tried it in a loop for a period of time (while the PR decorations were failing in SQ), we got all successful responses, around 100-200ms.

One other piece of information which might be useful, we are switching between repos in separate orgs in GHE, not sure if that would affect this issue.

So we are in the same situation, PR decorations are intermittently failing, when we curl from our running container we are seeing all successful responses with response times under 200 ms. We need to restart the application from Administration -> System to fix it.

pierreguillot · January 10, 2020, 8:58am

Thanks for coming back to us with theses information. What do you mean by switching between repo in separate orgs ?

Darragh_Grace · January 10, 2020, 3:48pm

Our GHE is organised by orgs, so we have org1, org2 etc and repo’s within those orgs. Users are given access to repos via orgs
e.g.
[c.s.C.A.B.A.A.C] Failed to request /repos/org1/lift/installation
[c.s.C.A.B.A.A.C] Failed to request /repos/org2/react-client-common/installation
Sonarqube is decorating PR’s from those different orgs/repos.

Hopefully i have explained that ok?

Darragh_Grace · January 21, 2020, 3:24pm

Hi @pierreguillot Is there any more information you need from me for this issue?

Darragh_Grace · January 28, 2020, 1:33pm

ping @pierreguillot or anyone that can help, this is a very painful issue for us, any help would be much appreciated.

JordanSussman · January 28, 2020, 3:25pm

We are seeing the exact same behavior with Sonarqube Enterprise edition 8.0 and GitHub Enterprise 2.19.5. A simple restart of the Sonarqube server always resolves the issue until it presents itself again later.

pierreguillot · January 28, 2020, 5:06pm

are you running both SonarQube and Github Entreprise on docker as well ?

JordanSussman · January 28, 2020, 5:08pm

We are running SonarQube using a docker container within Kubernetes, the GitHub Enterprise server is running on completely separate infrastructure utilizing EC2 instances and ELB.

pierreguillot · January 28, 2020, 5:10pm

Would you be able to give an approximate number of PR decoration that are succesful before the timeouts starts ?

Topic		Replies	Views
Pull request decoration failed: can't find GitHub App. Check in the Global configuration the URL, App ID and Private Key SonarQube Server / Community Build sonarqube	1	1326	December 1, 2020
Integrating sonarqube (gradle project) with github actions SonarQube Server / Community Build github-actions	4	4057	March 14, 2022
SonarQube sending results to Github failing intermittently SonarQube Server / Community Build sonarqube	5	1335	September 21, 2021
SonarQube background tasks failing SonarQube Server / Community Build background_task	3	759	December 2, 2020
ERROR: Fail to request https://scanner.sonarcloud.io/api/plugins/download?plugin=abap SonarQube Cloud scanner , sonarqube-cloud	2	1028	July 20, 2022

Sonarqube Developer - intermittant issues connecting to Github Enterprise

Related topics