K8S 1.20
Sonarqube Community Edition 8.9.6.50800
Postgres:10.6 (AWS Managed Service)
Sonar(K8S/POD) seats behind NGINX/Controller which in turns is behind AWS/LB. This very same architecture/setup used to work perfectly fine - no timeouts. As you can see below, nginx gets request though it times out. Nothing has changed other than sonar[ce] upgraded from 8.2 to 8.9. One out of 10+ attempts goes through…
I’m not familiar with Nginx logging, & so I’m not able to see any timeouts in the logs you shared. Fortunately you also shared what’s timing out: submission of an analysis report. You didn’t say this explicitly, but I’m going to guess that when you say “One out of 10+ attempts goes through” I guess you mean that one out of every 10 attempts to submit an analysis report goes through (and that all other requests to SonarQube, e.g. to paint the UI, succeed).
When we see analysis report submission timing out, it’s generally something on the network blocking the submission. And generally it’s about the size of the analysis report. That would explain why 1 in 10 gets through: the successful 1 fits under the allowed size limit. The other 9 are too big.
Why would this show up only after upgrade? Because with each new version of SonarQube, we add more rules (i.e. new rules raise new/more issues), and rules get smarter (i.e. fewer False Negatives → more issues from old rules). I haven’t done any statistical analysis, but it’s a fair bet that analysis reports get slightly larger with each new SonarQube version. And with this upgrade you hit the limits set in your internal network controls.
Thanks for taking the time to not only look into it though also share your thoughts. Appreciated. This is what makes this community GREAT!. I am sorry that did not mention it was the submission of the analysis report that times out. In the NGINX logs, you can see a 408 [“status”: “408” == ‘request timeout’] and “request_time” : “60.004” and truth be told, 60 seconds is very long timer. Though it is a client 4XX kind of error, it does not necessarily be a client problem If body was too large, we would have expected a 413 right away and not a 408. How can we check how much time sonarqube is taking ? We have had sonar for a few years now and it has gone through quite few upgrades though this is first time we get considerable number of 408s (default timer is 60seconds). We do understand what you mentioned though would be good to connect the dots, that is, measure exactly how much time sonar takes (and assuming it is acceptable) perhaps increase def 60seconds. Any thoughts? thanks /Pedro
The timeout you’re seeing is “I never got a response when I submitted the analysis report”. That’s it. The scanner submits the report and SonarQube says “Yup! Got it!” So the question would be not how long SonarQube is taking but transit time across your network. That’s what leads me to believe that the problem is not a timer in SonarQube, but somewhere else in your network. The post is being rejected because it’s larger than the (somewhat) arbitrary size limit baked into your system.
But okay. If you really want to verify this, watch the SonarQube access logs at analysis completion to see when the POST arrives at SonarQube.
Thanks though I am not really sure I follow you here. So, allow me to better explain the setup and the problem. Sorry I did a bad job on that.
Jenkins (client) sends the request which promptly makes to NGINX (which is last hop in the network path before Sonarqube)
9 out of 10 attempts, this leads to a request timeout from NGINX point of view (this is the “408” I shared earlier).
NGINX def timer that I mentioned is 60 seconds, therefore it sends a request to sonarqube and 9 out of 10 times throws a 408 because the 60seconds timer expired before sonarqube was able to send response back.
nginx (used as reverse-proxy) is the last network hop before sonarqube. I can tell you that nginx gets every single of these requests though please advise where Sonar’s access logs are located at? There are many possibilities here and it would be good to reduce the search domain Lets consider that just before Sonar upgrade to 8.2 (only change done), this really never occurred - which does not mean is Sonar and thats why I reached out to this community. Get some help and guidance. Pls let me know where access logs are and can def check on them thanks /Pedro
It did not take much to reproduce it and as you can see in the logs, the request is received by Sonar. Pls check it out (attached filed with debug logs). Pls let me know your thoughts. Cheers /Pedro
Here’s the part of your log that’s interesting to me:
2022.01.28 22:18:32 DEBUG ce[][o.postgresql.Driver] Connecting with URL: jdbc:postgresql://infra-postgresql.cluster-cskx4cpsa.us-west-1.rds.amazonaws.com:5432/sonarqube
2022.01.28 22:18:32 DEBUG ce[][o.p.c.v.ConnectionFactoryImpl] Trying to establish a protocol version 3 connection to infrastructure-postgresql.cluster-csuktjafy04g.us-east-1.rds.amazonaws.com:5432
2022.01.28 22:18:32 DEBUG ce[][o.p.c.v.ConnectionFactoryImpl] Receive Buffer Size is 6,291,456
2022.01.28 22:18:32 DEBUG ce[][o.p.c.v.ConnectionFactoryImpl] Send Buffer Size is 6,291,456
2022.01.28 22:18:32 DEBUG ce[][o.p.ssl.MakeSSL] converting regular socket connection to ssl <<--Pedro: Last log line before Jenkins receives:
This is SonarQube trying to talk to your DB. I’ve been assuming the block was between Jenkins & SonarQube (because 99.9% of the time that’s where it is) but it looks instead like the problem is between SonarQube and the DB. It looks like the failure is in trying to store the analysis report in the DB while the job is queued for processing.
Thank you. Appreciated. The sonar logs does not show a failure, does it? Last line shows that it is trying to convert regular socket to ssl and that’s it. If there was connectivity problem or ssl handshake problem then Id have expected sonar to throw error but it does not. AWS/Cloudwatch , at a glance, does not show any problems at all. Perhaps the analysis report is too big and sometimes sonar takes more than 60 seconds to process it? There is a client (jenkins) to server (sonar) leg that is established an awaiting an answer (and def timer there is very conservative 60secs). Now if sonar is slow or has problems or takes a lot to push analysis to the DB then we would need help to check. Something is fishy here becasue , and again, only thing that changed in the entire setup was Sonar. Nothing else got changed and/or upgraded.
That’s just it. From the log snippet, it looks like this is the initial receipt by SonarQube & storage of the analysis report in the DB for later processing. There’s no attempt to process it yet. That happens asynchronously, later.
The file you provided earlier ends at
sonarqube-sonarqube-5fc55f4947-qkcdc sonarqube 2022.01.28 22:18:32 DEBUG ce[o.p.ssl.MakeSSL] converting regular socket connection to ssl <<–Pedro: Last log line before Jenkins receives:
From Jenkins/CI:
[okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL, okhttp3.internal.http2.StreamResetException: stream was reset: CANCEL]
What happens in ce.log, web.logafter Jenkins gets that message?
Yes, and I explained in my initial response why a SonarQube upgrade could result in a slightly larger analysis report.
The log shared has ce and web as well. Have you had a chance to follow through the log to check on both? It is all there and in debug mode. Something still fishy here because client (jenkins) to server (sonar) on a 60 seconds timeout (quite a lot as most of the cases either response should be given right away or perhaps async as to not keep connection open for nothing) that does not seem to be enough - in most of the times - for sonar to do analysis report. Even if sonar adds more checks and so on because if someone decides to take advantage of most then sonar would not properly operate as client/server timesout before sonar is done. This is why I was wondering how can we check on why sonar takes so long as to ensure we are not missing anything? thanks /Pedro
This is exactly why I asked you to check on the connection between SonarQube and Postgres. What happens when Jenkins contacts SonarQube at the end of analysis is that the analysis report is submitted to the server and SonarQube stores the report in the DB for later processing.
Not coincidentally, I believe, the last thing we see in your log is SonarQube trying to establish a connection with Postgres.
Again, can you check on what’s going on between SonarQube and Postgres?
That trace is recurrent and even on working case, it prints the very same lines. Therefore, it does not seem to indicate a problem. Moreover, there is no transient problems between sonar and db as far as we can tell. We can try to allocate even more heap/memory to client (jenkins node) to see if the transport/cancel (originally reported) still occurs though at a glance, have not seem OOM / heap exhaustion. Cheers /P
Good morning.
Sorry took me a bit to get back though we have been trying to reduce the search domain by running different tests and going through a lot of logs. Latest on this one is that we can not reproduce it if I run it locally though it happens intermittently - as reported herein - when exercised via Jenkins/CI (K8S Jenkins Node) - which has never happened before this upgrade was done.
Regarding your request - do you want the whole log during and after the jenkins times out? Cheers /Pedro