DB connections get stuck timing out

We are running SonarQube 9.8 Developer edition in a docker container and connecting to a PostgreSQL 11 managed instance.

Everything operates normally for a while, but we start to experience scan errors after some days of operation. In the compute engine logs, we find the error below being written to the logs repeatedly. Once the issue starts it will persist seemingly indefinitely (we’ve let it try to resolve itself for a couple days).

During this time we are able to connect to the database using other tools such as pgAdmin. We are able to restore functionality by restarting the SonarQube container instance. This seems like it may initially be a transient connection issue, but how can we ensure that the SonarQube instance recovers properly from the error?

2023.01.11 20:02:55 WARN  ce[][c.z.h.pool.PoolBase] HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@33c1774e (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
2023.01.11 20:03:00 WARN  ce[][c.z.h.pool.PoolBase] HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@39b38c76 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
2023.01.11 20:03:00 ERROR ce[][o.s.c.t.CeWorkerImpl] Failed to pop the queue of analysis reports

This exception follows the error


org.apache.ibatis.exceptions.PersistenceException: 
### Error querying database.  Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 10004ms.
### The error may exist in org.sonar.db.property.InternalPropertiesMapper
### The error may involve org.sonar.db.property.InternalPropertiesMapper.selectAsText
### The error occurred while executing a query
### Cause: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 10004ms.
	at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:153)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:145)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:140)
	at org.apache.ibatis.binding.MapperMethod.executeForMany(MapperMethod.java:147)
	at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:80)
	at org.apache.ibatis.binding.MapperProxy$PlainMethodInvoker.invoke(MapperProxy.java:145)
	at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:86)
	at jdk.proxy2/jdk.proxy2.$Proxy52.selectAsText(Unknown Source)
	at org.sonar.db.property.InternalPropertiesDao.selectByKey(InternalPropertiesDao.java:179)
	at org.sonar.ce.queue.CeQueueImpl.getWorkersPauseStatus(CeQueueImpl.java:328)
	at org.sonar.ce.queue.InternalCeQueueImpl.peek(InternalCeQueueImpl.java:80)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.tryAndFindTaskToExecute(CeWorkerImpl.java:170)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.findAndProcessTask(CeWorkerImpl.java:153)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$TrackRunningState.get(CeWorkerImpl.java:135)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:87)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:53)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 10004ms.
	at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:696)
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:181)
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:146)
	at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100)
	at org.sonar.db.profiling.NullConnectionInterceptor.getConnection(NullConnectionInterceptor.java:31)
	at org.sonar.db.profiling.ProfiledDataSource.getConnection(ProfiledDataSource.java:112)
	at org.apache.ibatis.transaction.jdbc.JdbcTransaction.openConnection(JdbcTransaction.java:145)
	at org.apache.ibatis.transaction.jdbc.JdbcTransaction.getConnection(JdbcTransaction.java:67)
	at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:337)
	at org.apache.ibatis.executor.ReuseExecutor.prepareStatement(ReuseExecutor.java:88)
	at org.apache.ibatis.executor.ReuseExecutor.doQuery(ReuseExecutor.java:59)
	at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:325)
	at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
	at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109)
	at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:89)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:151)
	... 24 common frames omitted
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
	at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:907)
	at org.postgresql.jdbc.PgConnection.setNetworkTimeout(PgConnection.java:1649)
	at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:561)
	at com.zaxxer.hikari.pool.PoolBase.isConnectionDead(PoolBase.java:168)
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:170)
	... 38 common frames omitted

Same issue began occurring again 01-18 through 01-19 until restarting.

Hi,

Is there a firewall between SonarQube and its DB? Is this ticket relevant: SONAR-17762?

As a side note, SonarQube and its DB should be as close together as possible on the network - altho not on the same host - for optimal performance.

 
HTH,
Ann

We’re experiencing the issue again today.

Yes, the DB does have a firewall in between. Documentation seems to indicate there is a default timeout value of 4 minutes on the firewall, but the connections from SonarQube remain viable much longer than that. If that’s the case, I would expect an application to be able to recover from this type of situation without intervention.

It’s possible that the ticket you linked is related, but the problem described on it is very generic. Also it’s marked as resolved in SonarQube 9.8 which we are on.

The db is in the same Azure region and VNet, but on separate subnets (I believe).

Hi,

Yes. SonarQube keeps / thinks the connections are open far longer and continues to try to use them even after your firewall has killed them.

Yep. That means you now have the ability to customize some values. It would be a fair observation, though, to point out that the ticket doesn’t have any details about how to do that customization. I’ve asked for those details. And in the meantime, I tracked down the relevant commit.

It looks like there are new settings for your $SONARQUBE-HOME/conf/sonar.properties file:

sonar.jdbc.idleTimeout, default 600000
sonar.jdbc.keepaliveTime, default 300000
sonar.jdbc.maxLifetime, default 1800000
sonar.jdbc.validationTimeout, default 5000

 
HTH,
Ann

I couldn’t find an existing settings file in the $SONARQUBE-HOME/conf/sonar.properties. I’m not sure if this is normal for a container deployment. I pulled the default one from the sonarqube repo and noticed those settings weren’t included there. (Not sure if they should have been or not) In any case, I added the new settings and configured sonar.jdbc.keepaliveTime to a little under the firewall’s idle connection timeout, so hopefully we’ll have an idea if that’s helped in a few days.

I am still a little skeptical that this will resolve the issue, and tweaking these settings feels a bit more like a bandaid than an actual fix for this type of issues. In my opinion, SonarQube’s connection pool should be able to recover from this kind of failure state without needing to adjust these values. After all, there are many reasons why network connection might be temporarily disrupted, and it should be able to recover after the disruption is cleared.

Hi,

Yeah… We did the work but forgot to finish up by actually documenting it. After some poking, the Jira ticket^ has been updated. Hopefully we’ll get the conf file updated soon too.

 
Ann