SonarQube version: 8.7.1.42226
OpenJDK Runtime Environment 11.0.10+9
Alpine Linux v3.12 (3.12.4)
SonarScanner version: 4.5.0.2216
Java 11.0.7 Oracle Corporation (64-bit)
Linux 4.14.200-155.322.amzn2.x86_64 amd64
SonarQube Scanner for Jenkins version 2.13
We have jenkins integration setup through the SonarQube Scanner for Jenkins plugin. We have both sides configured correctly and it works… for a while. After an unpredictable amount of time and an unpredictable number of scans/webhook calls it will stop working. The webhooks, when they stop working, say: Response: Server Unreachable Duration: 20s. This can be temporarily fixed by doing nothing more than restart SonarQube. The webhooks will then work for an indeterminate period until the problem presents itself again. I believe this to be an issue in SonarQube itself because while the behavior is happening I can connect to the SonarQube system and post the json body from the failed webhook message via wget to the same address that the webhook is configured to use and it works every time I have tried it even while the webhook calls from within SonarQube continue to fail.
Jenkinsfile start the scan
def scannerHome = tool 'SonarScanner'
withSonarQubeEnv('SonarQube') {
sh "${scannerHome}/bin/sonar-scanner"
}
Jenkinsfile wait for the scan to be complete (or timeout once all the webhooks start failing)
I don’t quite follow here which side is failing, but the part where you say “[when I do it manually] it works every time” makes me suspect something on your network. Have you checked that angle? And if you believe the problem is on the SonarQube side, are you seeing errors anywhere? If it’s the webhook calls that are failing, what errors are you seeing for those calls?
@ganncamp
The Jenkins side, where it asks SonarQube for the results of the scan, always works. On the Sonarqube side, after the scan is complete and the webhook is called, this will work for an inconsistent amount of time after a restart of SonarQube. During that time every webhook succeeds. After the first webhook call fails, I am not sure what causes the failure, every subsequent webhook call fails until SonarQube is restarted. I can see the failures by going to Administration → Configuration → Webhooks and see the last delivery (or show recent deliveries). They look like this:
## Last delivery of Jenkins
Response: Server Unreachable
Duration: 20s
Payload:
{
json document that was supposed to be posted to the webhook
}
The “manual” process is to connect to a shell on the Linux system where SonarQube is running, which is using the same network interface and path as the failed webhook calls, and post the json payload to the Jenkins webhook endpoint using wget. An example of the wget command that I have been using is in my message that I posted on April 1st. When the payload is posted via wget the Jenkins job that is stuck waiting saying SonarQube task 'XXXX' status is 'IN_PROGRESS' will move forward with a message that says SonarQube task 'XXXX' status is 'SUCCESS' and will proceed. If I don’t post using wget when the webhooks are failing the build will wait for the 1 hour timeout and will fail.
To restart SonarQube I go to Administration → System and click the Restart Server button. After the restart is complete the webhooks will begin working again with no other changes to SonarQube, Jenkins, or any of the infrastructure. It seems extremely unlikely that a network issue would correct itself every time I restart a couple of java processes, especially a network issue that only causes failures for those java processes and not the other processes on the same system (like wget).
While on the topics of webhooks, from everything I have read it sounds to me like the webhook should only be used if the scan isn’t complete when the request for quality gate check is made via waitForQualityGate(). However, what I see is that even scans that are done well before their result is requested seem to be triggering a webhook call. This can lead to log output that looks like this:
Checking status of SonarQube task 'XXXX' on server 'SonarQube'
SonarQube task 'XXXX' status is 'SUCCESS'
SonarQube task 'XXXX' completed. Quality gate is 'OK'
SonarQube task 'XXXX' status is 'IN_PROGRESS'
SonarQube task 'XXXX' status is 'SUCCESS'
SonarQube task 'XXXX' completed. Quality gate is 'OK'
Can you confirm what the expected behavior is in this situtaion?
Okay, this sounds like a resource leak. Can you check the resources held by SonarQube the next time this happens? At a guess, the number of sockets may be of interest.
Nope. Webhooks are a general-purpose tool. You’re used to using them in the context of the Jenkins job, but they can also be used to update other things. All the relevant Webhooks will be sent at the end of every background task. That initial request the Jenkins runs is there to avoid a race condition. So yes, your log is expected.
@ganncamp
Here is information that I believe you were asking for about resources when it happens
bash-5.0$ netstat -np
netstat: showing only processes with your user ID
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:9001 127.0.0.1:60424 ESTABLISHED 25/java
tcp 0 0 10.150.111.84:57474 10.150.1.201:5432 ESTABLISHED 149/java
tcp 0 0 127.0.0.1:37050 127.0.0.1:9001 ESTABLISHED 100/java
tcp 0 0 127.0.0.1:9001 127.0.0.1:37050 ESTABLISHED 25/java
tcp 0 0 10.150.111.84:34926 10.150.1.201:5432 ESTABLISHED 100/java
tcp 0 0 10.150.111.84:46038 10.150.1.201:5432 ESTABLISHED 149/java
tcp 0 0 127.0.0.1:60424 127.0.0.1:9001 ESTABLISHED 100/java
tcp 0 0 10.150.111.84:46428 10.150.1.201:5432 ESTABLISHED 100/java
tcp 0 0 10.150.111.84:34920 10.150.1.201:5432 ESTABLISHED 100/java
tcp 0 0 10.150.111.84:34928 10.150.1.201:5432 ESTABLISHED 100/java
tcp 0 0 127.0.0.1:36150 127.0.0.1:9001 ESTABLISHED 149/java
tcp 0 0 10.150.111.84:46426 10.150.1.201:5432 ESTABLISHED 100/java
tcp 0 0 127.0.0.1:9001 127.0.0.1:36150 ESTABLISHED 25/java
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags Type State I-Node PID/Program name Path
unix 2 [ ] STREAM CONNECTED 159855467 100/java
unix 2 [ ] STREAM CONNECTED 159863398 149/java
unix 2 [ ] STREAM CONNECTED 159857058 25/java
unix 2 [ ] STREAM CONNECTED 159853412 25/java
unix 2 [ ] STREAM CONNECTED 159853344 1/java
unix 2 [ ] STREAM CONNECTED 159856571 100/java
unix 2 [ ] STREAM CONNECTED 159864898 149/java
unix 2 [ ] STREAM CONNECTED 159853831 1/java
It doesn’t look like the list is of concerning length. That being said, given what is being seen it looks like SonarQube is caching DNS lookups instead of doing a lookup each time and our Jenkins server is behind a load balancer where the IPs can change dynamically. I noticed today when the issue happened that the IPs of the load balancer have changed since it last worked.
Sonarqube fails to decorate merge requests when DNS entry to ALM changes
If you run SonarQube in an environment with a lot of DNS friction, you should define a DNS cache time to live policy as, by default, SonarQube will hold the DNS cache until it is restarted. You can set this policy to five seconds by doing the following:
`echo "networkaddress.cache.ttl=5" >> "${JAVA_HOME}/conf/security/java.security" `
Please be aware that this increases the risk of DNS spoofing attacks.
So it appears to be something that is being done on purpose. I am not sure how well that fits with todays dynamic cloud environments.
We had the issue reoccur and thus resorted to building our own imaged based on the official SonarQube image. We set networkaddress.cache.ttl to 0 in /opt/java/openjdk/conf/security/java.security in our image so we don’t drop webhooks on the floor when our IPs change.