I am running SonarQube Enterprise with the 10.0 Helm chart, and discovered a bug in the liveness check that causes it to incorrectly succeed when the system is down.
The check is written as follows:
Liveness: exec [sh -c host="$(hostname -i || echo '127.0.0.1')"
reply=$(wget --no-proxy -qO- --header="X-Sonar-Passcode: $SONAR_WEB_SYSTEMPASSCODE" http://${host}:9000/api/system/liveness 2>&1)
if [ -z "$reply" ]; then exit 0; else exit 1; fi
] delay=60s timeout=10s period=30s #success=1 #failure=6
The problem however, is when the server returns a 500 error there is no content in the reply. Since we have an external PostgreSQL database via Google Cloud SQL, I was able to simulate a problem with the database by breaking the database connectivity, but the status of the pod shows up as Running
when it is in fact in a failure mode.
When I shell into the SonarQube container and run the liveness check commands manually I am able to prove this out in more detail.
sonarqube@sonarqube-test-sonarqube-0:/opt/sonarqube$ host="$(hostname -i || echo '127.0.0.1')"
sonarqube@sonarqube-test-sonarqube-0:/opt/sonarqube$ echo $host
10.9.43.8
sonarqube@sonarqube-test-sonarqube-0:/opt/sonarqube$ reply=$(wget --no-proxy -qO- --header="X-Sonar-Passcode: $SONAR_WEB_SYSTEMPASSCODE" http://${host}:9000/api/system/liveness 2>&1)
sonarqube@sonarqube-test-sonarqube-0:/opt/sonarqube$ echo $?
8
sonarqube@sonarqube-test-sonarqube-0:/opt/sonarqube$ echo $reply
sonarqube@sonarqube-test-sonarqube-0:/opt/sonarqube$ wget --no-proxy --header="X-Sonar-Passcode: $SONAR_WEB_SYSTEMPASSCODE" http://${host}:9000/api/system/liveness 2>&1
--2023-04-26 22:09:38-- http://10.9.43.8:9000/api/system/liveness
Connecting to 10.9.43.8:9000... connected.
HTTP request sent, awaiting response... 500
2023-04-26 22:09:46 ERROR 500: (no description).
sonarqube@sonarqube-test-sonarqube-0:/opt/sonarqube$ echo $?
8
I think the solution is probably as simple as removing the entire conditional from the check, because wget will always return a non-zero status for any non 200 response, and the /api/system/liveness
API should never return a 3xx code.