Sonarqube Developer - intermittant issues connecting to Github Enterprise

Must-share information (formatted with Markdown):

  • which versions are you using (SonarQube, Scanner, Plugin, and any relevant extension)
    Sonarqube Developer 8.0 (Dockerised)
  • what are you trying to achieve
    Pull Request Analysis
  • what have you tried so far to achieve this
    We have a Dockerised Sonarqube Developer running on our cloud infrastructure. We are using it to analyse branches for Pull Requests. We followed all the documentation and successfully got it working. We are seeing however intermittent failures with it when connecting to Github Enterprise.
    Stacktrace below. To fix the issue we need to restart Sonarqube. It will work again for a while before the issue happens again. Do you have any idea why this might be happening or what I can do to fix it? It is proving a blocker for us.

Stacktrace
2019.11.25 06:32:41 ERROR ce[AW6hQsjq_1xs6e7rrbIN][c.s.C.A.B.A.A.C] Failed to request /repos/Ed-Assessment/react-client-common/installation
java.net.SocketTimeoutException: connect timed out
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:591)
at okhttp3.internal.platform.Platform.connectSocket(Platform.java:130)
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:263)
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:183)
at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224)
at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108)
at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88)
at okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:221)
at okhttp3.RealCall.execute(RealCall.java:81)
at com.sonarsource.C.A.B.A.A.D.B(Unknown Source)
at com.sonarsource.C.A.B.A.A.C.A(Unknown Source)
at com.sonarsource.C.A.B.A.A.C.A(Unknown Source)
at com.sonarsource.C.C.C.B.B(Unknown Source)
at com.sonarsource.C.C.C.B.A(Unknown Source)
at com.sonarsource.C.C.C.J.A(Unknown Source)
at com.sonarsource.C.C.Z.A(Unknown Source)
at java.base/java.util.Optional.ifPresent(Optional.java:183)
at com.sonarsource.C.C.Z.B(Unknown Source)
at com.sonarsource.C.C.Z.A(Unknown Source)
at org.sonar.ce.async.SynchronousAsyncExecution.addToQueue(SynchronousAsyncExecution.java:27)
at com.sonarsource.C.C.Z.A(Unknown Source)
at java.base/java.util.Optional.ifPresent(Optional.java:183)
at com.sonarsource.C.C.Z.finished(Unknown Source)
at org.sonar.ce.task.projectanalysis.api.posttask.PostProjectAnalysisTasksExecutor.executeTask(PostProjectAnalysisTasksExecutor.java:118)
at org.sonar.ce.task.projectanalysis.api.posttask.PostProjectAnalysisTasksExecutor.finished(PostProjectAnalysisTasksExecutor.java:109)
at org.sonar.ce.task.step.ComputationStepExecutor.executeListener(ComputationStepExecutor.java:91)
at org.sonar.ce.task.step.ComputationStepExecutor.execute(ComputationStepExecutor.java:63)
at org.sonar.ce.task.projectanalysis.taskprocessor.ReportTaskProcessor.process(ReportTaskProcessor.java:81)
at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.executeTask(CeWorkerImpl.java:209)
at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.run(CeWorkerImpl.java:191)
at org.sonar.ce.taskprocessor.CeWorkerImpl.findAndProcessTask(CeWorkerImpl.java:158)
at org.sonar.ce.taskprocessor.CeWorkerImpl$TrackRunningState.get(CeWorkerImpl.java:133)
at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:85)
at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:53)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

Welcome to the community!

Do you have a monitoring tool that check your github enterprise API for outage ?

While the connexion is broken, try to ssh inside the sonarqube container and try to fetch the api :

curl http(s)://[hostname]/api/v3

See if this is reachable from the container, and if not, from the docker host.

Thanks for getting back to me!

Yes, GHE is monitored and highly available, only outages should be scheduled maintenance windows etc.
Our Sonarqube instance is deployed to Apache Mesos, it will be difficult to ssh inside the container but I’ll check with our infrastructure teams if it is possible.
Is there some retry mechanism in Sonarqube for this situation? It might be worth considering as a restart of Sonarqube when we see this issue always solves it. It’s like the connection needs to be re-established…

On the PR decoration implementation, we use okhttp3 with a singleton that hold a OkHttpClient, reused for all Github calls during the whole SonarQube instance life. This OkHttpClient is using an internal connexion pool in order to recycle http and https connections, keeping the underlying TCP socket open for next calls. This pool holds up to 5 idle connections which will be evicted after 5 minutes of inactivity.

With such mechanism, except an unlikely bug in the library we use, there should be no need for a SonarQube restart, event with connexion that timeout. If you happen to have an idle time of at least 5min without any PR decoration, this is a totally new http connexion that is created.

For further investigation, you should definitely try to reproduce the error at a lower technical level, and look directly on the 3rd layers of the TCP/IP stack, and check on the docker container and the docker host what happen with the TCP sockets. For example, do the host still have available tcp ports for the socket to open ? Do you encounter packet loss/drop ? Is the DNS resolution ok ?

To answer your last question, there is no retry mechanism in place in SonarQube, and i don’t feel we need one : connexions are already pooled, evicted and recreated if needed.

an additional question : when you say you restart sonarqube, do you mean :

  • you restart it inside the container
  • you restart the container
  • you destroy/recreate the container

Thanks for the info Pierre. Looks like we have some debugging to do.

When I say restart, I mean we need to restart the app, not the container.

Hi Pierre, We are struggling to get to the bottom of this one. PR decoration in our Sonarqube developer is becoming unusable and I am unable to track down the issue. To solve we need to manually restart the application which is obviously not maintainable (it’s happening multiple times a day)
Could this issue be the result of a slow connnection to GHE by any chance? Below you can see some logs from today. We get a Warning in the Sonarqube UI saying Pull Request decoration failed. I will need to turn off our PR checks until I can get to the bottom of this (which is the whole reason we paid for sonarqube developer)
Attaching a selection of some of todays logs
sonarqube_logs.txt (89.3 KB)

With Trace logs turned on
2019.12.09 08:53:56 TRACE ce[AW7p3SZUT9wdpmAyBsaA][sql] time=1ms | sql=insert into ce_task_message ( uuid, task_uuid, message, created_at ) values ( ?, ?, ?, ? ) | params=AW7p3XxGGGSfpB7jxKNP, AW7p3SZUT9wdpmAyBsaA, Pull request decoration did not happen. Please install SonarQube Github application on the repository’s organization or user., 1575881636934

Not sure why it would think the sonarqube Github applicxation is not installed.

If I restart Sonarqube from the System tab then it works again (for a while). Can someone help me out here? @pierreguillot

1 Like

Your logs show indeed that PR decoration do not happen, and fail after a 20s timeout on a call to Github Enterprise API.

The timeout is 20s, so by slow you mean a GH enterprise response time > 20s, yes it is your issue. Did you managed to curl the GH API from a container deployed in your cluster ?

Hi Pierre, So if GHE did not respond in 20 secs, all further PR decorations do not work until I restart the Sonarqube server, there does not seem to be any recovering from it. If I restart sonarqube server it will work fine again for maybe a day or two. Is this expected behaviour?

The behaviour is that every PR decoration try to call GHE to do the decoration, regardless of previous fail. I guess something is going wrong after xxx uptime, and all next connexions fails after that point. But SonarQube is strill trying to connect for every PR decoration.

If restarting SonarQube is fixing the issue, that may be because somehow SQ fail to reuse already opened connexion, of fail to acquire a new one. You need to drill down to the tcp stack and analyse what is happening on the container. Monitor outgoing connexion (with netstat or tcpdump). While under tcp monitoring, run on the container a curl to the GHE API, to see if it’s working when SQ fails to connect.

This will help us to narrow down your issue.

Hi @pierreguillot We were able to dig some more in to this. We were able to Curl with a JWT from the appropriate private key to the app from the container, we tried it in a loop for a period of time (while the PR decorations were failing in SQ), we got all successful responses, around 100-200ms.

One other piece of information which might be useful, we are switching between repos in separate orgs in GHE, not sure if that would affect this issue.

So we are in the same situation, PR decorations are intermittently failing, when we curl from our running container we are seeing all successful responses with response times under 200 ms. We need to restart the application from Administration -> System to fix it.

Thanks for coming back to us with theses information. What do you mean by switching between repo in separate orgs ?

Our GHE is organised by orgs, so we have org1, org2 etc and repo’s within those orgs. Users are given access to repos via orgs
e.g.
[c.s.C.A.B.A.A.C] Failed to request /repos/org1/lift/installation
[c.s.C.A.B.A.A.C] Failed to request /repos/org2/react-client-common/installation
Sonarqube is decorating PR’s from those different orgs/repos.

Hopefully i have explained that ok?

Hi @pierreguillot Is there any more information you need from me for this issue?

ping @pierreguillot or anyone that can help, this is a very painful issue for us, any help would be much appreciated.

We are seeing the exact same behavior with Sonarqube Enterprise edition 8.0 and GitHub Enterprise 2.19.5. A simple restart of the Sonarqube server always resolves the issue until it presents itself again later.

are you running both SonarQube and Github Entreprise on docker as well ?

We are running SonarQube using a docker container within Kubernetes, the GitHub Enterprise server is running on completely separate infrastructure utilizing EC2 instances and ELB.

Would you be able to give an approximate number of PR decoration that are succesful before the timeouts starts ?