Compute engine fails silently on database errors

  • version: sonarqube 9.2.6 installed via community helm chart - from here: helm-chart-sonarqube/charts/sonarqube at master · SonarSource/helm-chart-sonarqube · GitHub
  • description: when the compute engine encounters a database error, on popping tasks from the queue, tasks stop being processed, but the OS-level process keeps running, which prevents sonarqube from restarting it
  • how to reproduce: just shut down the database while sonarqube has many background tasks in the queue, then restart it - the CE process won’t pick up the restart
  • workaround: I have no idea how to work around it - when we notice it, we kill the compute engine process manually, and after a short while sonar starts it again - but at least some pending tasks are lost, it seems - there are dozens pending before the kill, and after the kill there are abruptly zero.

Relevant CE log part:

2021.11.10 21:33:10 ERROR ce[][o.s.c.t.CeWorkerImpl] Failed to pop the queue of analysis reports
org.apache.ibatis.exceptions.PersistenceException: 
### Error querying database.  Cause: org.postgresql.util.PSQLException: FATAL: the database system is shutting down
### The error may exist in org.sonar.db.property.InternalPropertiesMapper
### The error may involve org.sonar.db.property.InternalPropertiesMapper.selectAsText
### The error occurred while executing a query
### Cause: org.postgresql.util.PSQLException: FATAL: the database system is shutting down
	at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:149)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:140)
	at org.apache.ibatis.binding.MapperMethod.executeForMany(MapperMethod.java:147)
	at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:80)
	at org.apache.ibatis.binding.MapperProxy$PlainMethodInvoker.invoke(MapperProxy.java:144)
	at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:85)
	at com.sun.proxy.$Proxy31.selectAsText(Unknown Source)
	at org.sonar.db.property.InternalPropertiesDao.selectByKey(InternalPropertiesDao.java:155)
	at org.sonar.ce.queue.CeQueueImpl.getWorkersPauseStatus(CeQueueImpl.java:331)
	at org.sonar.ce.queue.InternalCeQueueImpl.peek(InternalCeQueueImpl.java:79)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.tryAndFindTaskToExecute(CeWorkerImpl.java:174)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.findAndProcessTask(CeWorkerImpl.java:155)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$TrackRunningState.get(CeWorkerImpl.java:137)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:89)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:53)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.postgresql.util.PSQLException: FATAL: the database system is shutting down
	at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:613)
	at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:161)
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:213)
	at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:51)
	at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:225)
	at org.postgresql.Driver.makeConnection(Driver.java:465)
	at org.postgresql.Driver.connect(Driver.java:264)
	at org.apache.commons.dbcp2.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:55)
	at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:355)
	at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:889)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:424)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:349)
	at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:134)
	at org.apache.commons.dbcp2.BasicDataSource$PaGetConnection.run(BasicDataSource.java:73)
	at org.apache.commons.dbcp2.BasicDataSource$PaGetConnection.run(BasicDataSource.java:69)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:744)
	at org.sonar.db.profiling.NullConnectionInterceptor.getConnection(NullConnectionInterceptor.java:31)
	at org.sonar.db.profiling.ProfiledDataSource.getConnection(ProfiledDataSource.java:317)
	at org.apache.ibatis.transaction.jdbc.JdbcTransaction.openConnection(JdbcTransaction.java:138)
	at org.apache.ibatis.transaction.jdbc.JdbcTransaction.getConnection(JdbcTransaction.java:60)
	at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:336)
	at org.apache.ibatis.executor.ReuseExecutor.prepareStatement(ReuseExecutor.java:88)
	at org.apache.ibatis.executor.ReuseExecutor.doQuery(ReuseExecutor.java:59)
	at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:324)
	at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
	at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109)
	at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:83)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:147)
	... 23 common frames omitted

Nothing more is added to the CE log until the CE process is killed manually. But the OS-level java process still runs.

Hi,
The workaround is to manually kill the CE, as you did. I think you should look into the root cause of the problem, which is having a DB shut down with SonarQube running. That’s an extreme situation that the CE is not able to handle.

We’re running sonarqube in the cloud. The database does not effectively shut down, it just shuts down connections forcefully, when one instance moves to another location, for example. This essentially means the CE isn’t fit for running in the cloud.

It would be tough to be resilient to DB connections being forcefully closed at any moment. Would you be able to point me to any documentation about that for the cloud provider you’re using so I can understand a little bit better the problem?

What happens isn’t specific to a particular provider. Making it specific to a particular provider wouldn’t be a significant improvement.

Right now, if you kill the CE process manually, sonarqube will restart it. If sonarqube would check the health of the CE process in some way, instead of just checking its presence, it could kill and restart it whenever it finds it to be broken.

This health check could have a very simple initial implementation, like checking that some named threads are still running, and be enriched over time with additional checks, as additional ways of how things can go wrong without the process terminating are discovered.

Hi @AnonymousCoward ,

We’ve started looking into it and right now the steps to reproduce this issue that you provided don’t lead us to the issue that you are having. While the Exception about database system being shutdown can be easily reproduced, it seems that Compute Engine works just fine afterwards and accepts future tasks.

Could you provide some more information about your environment? How do you restart your database? If you are using clustered postgres then what mode do you use for shutting down the database? And when kubernetes helm chart - do you restart the pod that hosts postgres?

Thank you in advance.