Webhook to jenkins fails without any changes

SonarQube version: 8.7.1.42226
OpenJDK Runtime Environment 11.0.10+9
Alpine Linux v3.12 (3.12.4)
SonarScanner version: 4.5.0.2216
Java 11.0.7 Oracle Corporation (64-bit)
Linux 4.14.200-155.322.amzn2.x86_64 amd64
SonarQube Scanner for Jenkins version 2.13

We have jenkins integration setup through the SonarQube Scanner for Jenkins plugin. We have both sides configured correctly and it works… for a while. After an unpredictable amount of time and an unpredictable number of scans/webhook calls it will stop working. The webhooks, when they stop working, say: Response: Server Unreachable Duration: 20s. This can be temporarily fixed by doing nothing more than restart SonarQube. The webhooks will then work for an indeterminate period until the problem presents itself again. I believe this to be an issue in SonarQube itself because while the behavior is happening I can connect to the SonarQube system and post the json body from the failed webhook message via wget to the same address that the webhook is configured to use and it works every time I have tried it even while the webhook calls from within SonarQube continue to fail.

Jenkinsfile start the scan

def scannerHome = tool 'SonarScanner'
withSonarQubeEnv('SonarQube') {
    sh "${scannerHome}/bin/sonar-scanner"
}

Jenkinsfile wait for the scan to be complete (or timeout once all the webhooks start failing)

timeout(time: 1, unit: 'HOURS') {
    def qg = waitForQualityGate()
    if (qg.status != 'OK') {
        unstable("SonarQube quality gate failure: ${qg.status}")
    }
}

Output in jenkins build log when issue is occurring until the timeout occurs

Checking status of SonarQube task 'XXXX' on server 'SonarQube'
SonarQube task 'XXXX' status is 'IN_PROGRESS'

As stated the two “workarounds” that I have found are to restart SonarQube or to complete the post myself. Neither of these are really valid.

For reference here is the wget command that works from the system running SonarQube even when the webhooks are failing:

wget -S --header="Accept-Encoding: gzip, deflate" --header='Accept-Charset: UTF-8' --header='Content-Type: application/json' -O response.json --post-data '{}' https://XXXX/sonarqube-webhook/

Hi,

Welcome to the community!

I don’t quite follow here which side is failing, but the part where you say “[when I do it manually] it works every time” makes me suspect something on your network. Have you checked that angle? And if you believe the problem is on the SonarQube side, are you seeing errors anywhere? If it’s the webhook calls that are failing, what errors are you seeing for those calls?

 
Ann

@ganncamp
The Jenkins side, where it asks SonarQube for the results of the scan, always works. On the Sonarqube side, after the scan is complete and the webhook is called, this will work for an inconsistent amount of time after a restart of SonarQube. During that time every webhook succeeds. After the first webhook call fails, I am not sure what causes the failure, every subsequent webhook call fails until SonarQube is restarted. I can see the failures by going to Administration → Configuration → Webhooks and see the last delivery (or show recent deliveries). They look like this:

## Last delivery of Jenkins
Response: Server Unreachable
Duration: 20s
Payload:
{
json document that was supposed to be posted to the webhook
}

The “manual” process is to connect to a shell on the Linux system where SonarQube is running, which is using the same network interface and path as the failed webhook calls, and post the json payload to the Jenkins webhook endpoint using wget. An example of the wget command that I have been using is in my message that I posted on April 1st. When the payload is posted via wget the Jenkins job that is stuck waiting saying SonarQube task 'XXXX' status is 'IN_PROGRESS' will move forward with a message that says SonarQube task 'XXXX' status is 'SUCCESS' and will proceed. If I don’t post using wget when the webhooks are failing the build will wait for the 1 hour timeout and will fail.

To restart SonarQube I go to Administration → System and click the Restart Server button. After the restart is complete the webhooks will begin working again with no other changes to SonarQube, Jenkins, or any of the infrastructure. It seems extremely unlikely that a network issue would correct itself every time I restart a couple of java processes, especially a network issue that only causes failures for those java processes and not the other processes on the same system (like wget).

While on the topics of webhooks, from everything I have read it sounds to me like the webhook should only be used if the scan isn’t complete when the request for quality gate check is made via waitForQualityGate(). However, what I see is that even scans that are done well before their result is requested seem to be triggering a webhook call. This can lead to log output that looks like this:

Checking status of SonarQube task 'XXXX' on server 'SonarQube'
SonarQube task 'XXXX' status is 'SUCCESS'
SonarQube task 'XXXX' completed. Quality gate is 'OK'
SonarQube task 'XXXX' status is 'IN_PROGRESS'
SonarQube task 'XXXX' status is 'SUCCESS'
SonarQube task 'XXXX' completed. Quality gate is 'OK'

Can you confirm what the expected behavior is in this situtaion?

Hi,

Okay, this sounds like a resource leak. Can you check the resources held by SonarQube the next time this happens? At a guess, the number of sockets may be of interest.

Nope. Webhooks are a general-purpose tool. You’re used to using them in the context of the Jenkins job, but they can also be used to update other things. All the relevant Webhooks will be sent at the end of every background task. That initial request the Jenkins runs is there to avoid a race condition. So yes, your log is expected.

 
Ann

@ganncamp
Here is information that I believe you were asking for about resources when it happens

bash-5.0$ netstat -np
netstat: showing only processes with your user ID
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:9001          127.0.0.1:60424         ESTABLISHED 25/java
tcp        0      0 10.150.111.84:57474     10.150.1.201:5432       ESTABLISHED 149/java
tcp        0      0 127.0.0.1:37050         127.0.0.1:9001          ESTABLISHED 100/java
tcp        0      0 127.0.0.1:9001          127.0.0.1:37050         ESTABLISHED 25/java
tcp        0      0 10.150.111.84:34926     10.150.1.201:5432       ESTABLISHED 100/java
tcp        0      0 10.150.111.84:46038     10.150.1.201:5432       ESTABLISHED 149/java
tcp        0      0 127.0.0.1:60424         127.0.0.1:9001          ESTABLISHED 100/java
tcp        0      0 10.150.111.84:46428     10.150.1.201:5432       ESTABLISHED 100/java
tcp        0      0 10.150.111.84:34920     10.150.1.201:5432       ESTABLISHED 100/java
tcp        0      0 10.150.111.84:34928     10.150.1.201:5432       ESTABLISHED 100/java
tcp        0      0 127.0.0.1:36150         127.0.0.1:9001          ESTABLISHED 149/java
tcp        0      0 10.150.111.84:46426     10.150.1.201:5432       ESTABLISHED 100/java
tcp        0      0 127.0.0.1:9001          127.0.0.1:36150         ESTABLISHED 25/java
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags       Type       State         I-Node PID/Program name    Path
unix  2      [ ]         STREAM     CONNECTED     159855467 100/java
unix  2      [ ]         STREAM     CONNECTED     159863398 149/java
unix  2      [ ]         STREAM     CONNECTED     159857058 25/java
unix  2      [ ]         STREAM     CONNECTED     159853412 25/java
unix  2      [ ]         STREAM     CONNECTED     159853344 1/java
unix  2      [ ]         STREAM     CONNECTED     159856571 100/java
unix  2      [ ]         STREAM     CONNECTED     159864898 149/java
unix  2      [ ]         STREAM     CONNECTED     159853831 1/java

It doesn’t look like the list is of concerning length. That being said, given what is being seen it looks like SonarQube is caching DNS lookups instead of doing a lookup each time and our Jenkins server is behind a load balancer where the IPs can change dynamically. I noticed today when the issue happened that the IPs of the load balancer have changed since it last worked.

We found this info here in the SonarQube docs.

Sonarqube fails to decorate merge requests when DNS entry to ALM changes
If you run SonarQube in an environment with a lot of DNS friction, you should define a DNS cache time to live policy as, by default, SonarQube will hold the DNS cache until it is restarted. You can set this policy to five seconds by doing the following:

`echo "networkaddress.cache.ttl=5" >> "${JAVA_HOME}/conf/security/java.security" `
Please be aware that this increases the risk of DNS spoofing attacks.

So it appears to be something that is being done on purpose. I am not sure how well that fits with todays dynamic cloud environments.

Thoughts?

Hi,

Uhm… I’m glad you figured it out?

& FYI, I’ve flagged this internally for potentially further attention.

 
Ann

There is apparently already a github issue for this related to the docker image which we use.

The work around of setting -Dsun.net.inetaddr.ttl from the github issue seems to be working for us.

1 Like

We had the issue reoccur and thus resorted to building our own imaged based on the official SonarQube image. We set networkaddress.cache.ttl to 0 in /opt/java/openjdk/conf/security/java.security in our image so we don’t drop webhooks on the floor when our IPs change.