Intermittent waitForQualityGate timeout, webhook received & ignored?

Observed May 16, 2019

SonarCloud
SonarQube Scanner for Jenkins 2.8.1
Kubernetes Plugin for Jenkins 1.15.2

Around 50% of the time waitForQualityGate is timing out in our pipeline. The webhook is configured and we can see Status Code 200 with the correct payload in the SonarCloud Administration “recent deliveries” UI. Regardless, the logs for waitForQualityGate never continue beyond “IN_PROGRESS” and our timeout option kicks in and aborts the build.

Logs from one such event:

Jenkins Pipeline Log

Checking status of SonarQube task 'AWrCbBP***********' on server 'sq-cloud'
SonarQube task 'AWrCbBP***********' status is 'IN_PROGRESS'

Jenkins Server Logs (Abbreviated)

May 16, 2019 8:52:37 PM INFO hudson.plugins.sonar.SonarBuildWrapper setUp
Injecting SonarQube environment variables using the configuration: sq-cloud
May 16, 2019 8:52:37 PM INFO hudson.plugins.sonar.SonarBuildWrapper createLoggerDecorator
Masking SonarQube passwords
May 16, 2019 8:53:45 PM INFO org.sonarsource.scanner.jenkins.pipeline.SonarQubeWebHook doIndex
Received POST from 18.195.64.198
May 16, 2019 8:53:45 PM FINE org.sonarsource.scanner.jenkins.pipeline.SonarQubeWebHook
Full details of the POST was {"serverUrl":"https://sonarcloud.io","taskId":"AWrCbBP***********","status":"SUCCESS","analysedAt":"2019-05-16T22:53:21+0200","changedAt":"2019-05-16T22:53:21+0200"....

Pipeline Definition (relevant portion)

stage('SonarQube Analysis') {
    steps {
        withSonarQubeEnv('sq-cloud') {
            container('node-browsers') {
                script {
                    sh "npm run sonar-scanner -- " +
                            "-Dsonar.host.url=${env.SONAR_HOST_URL} " +
                            "-Dsonar.login=${env.SONAR_AUTH_TOKEN} " +
                            "-Dsonar.branch.name=${env.GIT_BRANCH}"
                }
            }
        }
    }
}
stage('SonarQube Quality Gate') {
    options { timeout(time: 30, unit: 'MINUTES') }
    steps {
        waitForQualityGate(abortPipeline: true)
    }
}

It feels like a race condition but I don’t understand how that could be possible after reading the docs. SonarQube Plugin Docs

Note that to prevent race conditions, when the step starts (or is restarted) a direct call is made to the server to check if the task is already completed.

Perhaps it’s related to the use of the Kubernetes plugin. I don’t really know how withSonarQubeEnv and waitForQualityGate interact so I couldn’t really say for sure.

2 Likes

UPDATE!
I ran two experiments to narrow this down:

Experiment 1

  1. I waited for the pipeline to get stuck on the waitForQualityGate command
  2. I found the webhook that was already sent to Jenkins by SonarCloud
  3. I resent the webhook to our Jenkins instance

Result: The pipeline continued, Success!
This tells me that Jenkins is indeed listening for the webhook (eventually) and responds correctly once it receives it.

Experiment 2

  1. I disabled the webhook in SonarCloud
  2. Started the pipeline
  3. Pipeline got stuck on the waitForQualityGate

I tried restarting the entire pipeline multiple times and it always got stuck on the waitForQualityGate.

I am lead to believe that the race condition breaking logic is not working as intended.

Note that to prevent race conditions, when the step starts (or is restarted) a direct call is made to the server to check if the task is already completed.

I think either SonarCloud is not responding with the correct information (eventual consistency bug? which API is it even using?) or waitForQualityGate is not making a successful request to retrieve the quality gate result. No idea how to verify this.

1 Like

UPDATE

Experiment 3

I built a simple Node.js application that receives the webhook and then immediately starts querying the REST API for task (api/ce/task) and quality gate project status (api/qualitygates/project_status).

In my example, the webhook, with status=“SUCCESS” and quality gate status=“OK” was received at 2019-05-20T17:18:36.146Z. The task information and analysisId wasn’t available until 2019-05-20T17:18:40.059Z. This means the webhook is ahead of the API by almost 4 seconds. The race condition logic assumes the webhook will never be ahead of the API.

Raw logs:

{"message":"Webhook received!: TaskId=AWrWQ************, Status:SUCCESS","level":"info","timestamp":"2019-05-20T17:18:36.146Z"}
{"message":"AnalysisId=undefined","level":"info","timestamp":"2019-05-20T17:18:37.065Z"}
{"message":"AnalysisId=undefined","level":"info","timestamp":"2019-05-20T17:18:37.732Z"}
{"message":"AnalysisId=undefined","level":"info","timestamp":"2019-05-20T17:18:38.591Z"}
{"message":"AnalysisId=undefined","level":"info","timestamp":"2019-05-20T17:18:39.249Z"}
{"message":"AnalysisId=AWrWQ**********","level":"info","timestamp":"2019-05-20T17:18:40.059Z"}
{"message":"0: Sending Request","level":"info","timestamp":"2019-05-20T17:18:40.565Z"}
{"message":"0: Gate Status: OK","level":"info","timestamp":"2019-05-20T17:18:41.023Z"}

Conclusion
Jenkins is not fast enough to always catch the webhook. Sonarcloud sends the webhook out before the REST API is consistent with the webhook.

I think the workaround is to ignore the webhook altogether. Place a sleep statement before waitForQualityGate and simply rely on the direct call to Sonar to check the status of the quality gate.

3 Likes

Hi Nate, Thanks for posting your experiments and conclusion! I also arrived at placing a sleep(60) in front of our waitForQualityGate step.

I wonder if waitForQualityGate will get updated to support SonarCloud or if there’s a new method that’s better suited to SonarCloud. Adding an arbitrary sleep statement to every Jenkinsfile is tedious and less than ideal.

Hey Nate,

Turns out SonarCloud’s webhooks were being blocked by our edge firewalls. Adding exceptions and setting up the webhooks here: SonarCloud worked for me.

See this thread as well, I missed it the first time I was looking for a fix for this issue:

Cheers :beers:

1 Like