How to recover SonarQube from a DB restart/failover?

We are running SonarQube Developer Edition - Version 8.9.1. Sometimes the SonarQube Compute Engine failed to recover from a DB restart/failover, the new scan task will be adding to pending list. And if you access the /api/system/health API, it returns you below INFO which indicates the SonarQube server’s status is GREEN (misleading status):

{
    "health": "GREEN",
    "causes": []
}

If you request the /api/ce/activity_status API, it indicates no in progress task but pending tasks there:

{
    "pending": 9,
    "failing": 0,
    "inProgress": 0,
    "pendingTime": 2287992
}

Meanwhile, the web UI is still working. I checked the SonarQube container memory & CPU usage, seems all in a good state:

Mem: 40220396K used, 25735064K free, 4000K shrd, 1166152K buff, 26405872K cached
CPU:   1% usr   0% sys   0% nic  96% idle   0% io   0% irq   0% sirq
Load average: 0.13 0.27 0.35 2/1629 175096
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
  261     1 sonarqub S    9417m  14%   9   0% /opt/java/openjdk/bin/java -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/opt/son
  524     1 sonarqub S    10.3g  16%   9   0% /opt/java/openjdk/bin/java -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djava.io.tmpdir=/opt/son
   37     1 sonarqub S    10.0g  15%   8   0% /opt/java/openjdk/bin/java -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCM
    1     0 sonarqub S    8059m  12%   8   0% java -jar lib/sonar-application-8.9.1.44547.jar -Dsonar.log.console=true
175088     0 root     S     2424   0%  11   0% bash
175095175088 root     R     1580   0%  12   0% top	

Here is the error log (attaching the full log):

2021.12.02 06:36:28 INFO  ce[AX152537kqeJmRGwNTzl][o.s.c.t.CeWorkerImpl] Executed task | project=dshub:dshub-svc-data-catalog | type=REPORT | pullRequest=71 | id=AX152537kqeJmRGwNTzl | submitter=sterlingcicd-sterlingcicd9702 | status=SUCCESS | time=10541ms
2021.12.02 06:39:04 ERROR ce[][o.s.c.t.CeWorkerImpl] Failed to pop the queue of analysis reports
org.apache.ibatis.exceptions.PersistenceException: 
### Error querying database.  Cause: org.postgresql.util.PSQLException: FATAL: the database system is shutting down
### The error may exist in org.sonar.db.property.InternalPropertiesMapper
### The error may involve org.sonar.db.property.InternalPropertiesMapper.selectAsText
### The error occurred while executing a query
### Cause: org.postgresql.util.PSQLException: FATAL: the database system is shutting down
	at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:149)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:140)
	at org.apache.ibatis.binding.MapperMethod.executeForMany(MapperMethod.java:147)
	at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:80)
	at org.apache.ibatis.binding.MapperProxy$PlainMethodInvoker.invoke(MapperProxy.java:152)
	at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:85)
	at com.sun.proxy.$Proxy29.selectAsText(Unknown Source)
	at org.sonar.db.property.InternalPropertiesDao.selectByKey(InternalPropertiesDao.java:155)
	at org.sonar.ce.queue.CeQueueImpl.getWorkersPauseStatus(CeQueueImpl.java:328)
	at org.sonar.ce.queue.InternalCeQueueImpl.peek(InternalCeQueueImpl.java:78)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.tryAndFindTaskToExecute(CeWorkerImpl.java:170)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.findAndProcessTask(CeWorkerImpl.java:153)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$TrackRunningState.get(CeWorkerImpl.java:135)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:87)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:53)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.postgresql.util.PSQLException: FATAL: the database system is shutting down
	at org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:613)
	at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:161)
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:213)
	at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:51)
	at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:223)
	at org.postgresql.Driver.makeConnection(Driver.java:465)
	at org.postgresql.Driver.connect(Driver.java:264)
	at org.apache.commons.dbcp2.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:55)
	at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:355)
	at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:889)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:424)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:349)
	at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:134)
	at org.apache.commons.dbcp2.BasicDataSource$PaGetConnection.run(BasicDataSource.java:73)
	at org.apache.commons.dbcp2.BasicDataSource$PaGetConnection.run(BasicDataSource.java:69)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:744)
	at org.sonar.db.profiling.NullConnectionInterceptor.getConnection(NullConnectionInterceptor.java:31)
	at org.sonar.db.profiling.ProfiledDataSource.getConnection(ProfiledDataSource.java:317)
	at org.apache.ibatis.transaction.jdbc.JdbcTransaction.openConnection(JdbcTransaction.java:139)
	at org.apache.ibatis.transaction.jdbc.JdbcTransaction.getConnection(JdbcTransaction.java:61)
	at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:337)
	at org.apache.ibatis.executor.ReuseExecutor.prepareStatement(ReuseExecutor.java:88)
	at org.apache.ibatis.executor.ReuseExecutor.doQuery(ReuseExecutor.java:59)
	at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:325)
	at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
	at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:109)
	at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:89)
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:147)
	... 23 common frames omitted

If I use kubectl rollout restart deploy to restart the SonarQube deployment, it will work again to process the pending tasks.

I need help on below 2 questions:

  1. Why /api/system/health returns GREEN status while the Compute Engine is not working?
  2. Is there any strategy for surviving db restart/failover – perhaps as simple as increasing some timeout or something?

Thanks in advance for your help!

sonar-log-2021.12.02.txt (28.9 KB)
Attaching the log file.

Hi @YanFang ,

this is already on our radar as a point for improvement and tracked with SONAR-15693. We hope to improve the situation with the next release of SQ.

@Tobias_Trabelsi thanks for the quick response! How about this one? Is it a bug? Will it be fixed in SONAR-15693?

Why /api/system/health returns GREEN status while the Compute Engine is not working?

Usually the endpoint in question will return YELLOW when the CE can not perform its duty. Both of these observations have the same origin, yes

1 Like

Hi, from what I understand the error message means that the database is currently shutting down, and disallow new connexion. Then it’s either a Smart Shutdown mode or a Fast Shutdown mode. Either way, if you keep having this error message, I’m wondering if the database actually restarted/failover, or if it’s still waiting indefinitely for existing sessions to terminate. From what I tested, the Compute Engine is able to reconnect to the DB once it’s live again.

When you are in such a situation, what are the open connexions & sessions you observe on the Database side? Are they from SQ?

Hi Pierre, sorry for the late response! Somehow I missed your question above.

I’m wondering if the database actually restarted/failover

No, the database was restarted/failover successfully, if I restart sonarqube deployment, it can connect to the database and works again.

When you are in such a situation, what are the open connexions & sessions you observe on the Database side? Are they from SQ?

We upgraded sonarqube to v9.2 and didn’t see this issue since then, we will check the open connections & sessions on the Database side next time if this issue occurs again. But our Database is delegated for Sonarqube only. Thanks for your suggestion!

Latest update: after we upgraded sonarqube to v9.2 (now we are on v9.4), Compute Engine is able to recover itself from DB connection failure once the DB is live again.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.