SonarQube upgrade from 7.9.3 to 8.4.0 hangs on database migration

We are attempting to upgrade from Version 7.9.3 to 8.4.0.

Our initial attempt at this upgrade failed due to the server becoming unresponsive during DBMigration #3208.

We had to abort the upgrade and rollback our installation once we encountered this issue.

The next day we created a test replica of our sonar server/database to attempt the upgrade on (so as to not affect our production installation). During this upgrade attempt we observed the same behavior. The server became unresponsive and we lost our ssh session. We decided to let it be, and after 7 hours the server became responsive again once DBMigration #3208 had completed. However, the sonar process had died so we had to restart it, at which point it continued patching for another 2 hours and completely finished the Database Migration in 9 hours total.

Logging observed:

2020.07.23 15:11:38 INFO  web[][DbMigrations] #3208 'Remove old Security Review Rating measures'...
2020.07.23 15:12:18 INFO  web[][o.s.s.p.d.m.s.MassUpdate] 1521 rows processed (25 items/sec)
2020.07.23 15:13:18 INFO  web[][o.s.s.p.d.m.s.MassUpdate] 249 rows processed (4 items/sec)
2020.07.23 15:14:18 INFO  web[][o.s.s.p.d.m.s.MassUpdate] 499 rows processed (4 items/sec)
...
2020.07.23 16:36:18 INFO  web[][o.s.s.p.d.m.s.MassUpdate] 1249 rows processed (0 items/sec)
2020.07.23 16:37:18 INFO  web[][o.s.s.p.d.m.s.MassUpdate] 1249 rows processed (0 items/sec)
2020.07.23 16:38:17 WARN  web[][o.s.p.ProcessEntryPoint] Fail to start web
java.lang.IllegalArgumentException: Unable to create shared memory :
        at org.sonar.process.sharedmemoryfile.AllProcessesCommands.<init>(AllProcessesCommands.java:103)
        at org.sonar.process.sharedmemoryfile.DefaultProcessCommands.<init>(DefaultProcessCommands.java:34)
        at org.sonar.process.sharedmemoryfile.DefaultProcessCommands.secondary(DefaultProcessCommands.java:52)
        at org.sonar.server.app.WebServer.isOperational(WebServer.java:69)
        at org.sonar.server.app.WebServer.getStatus(WebServer.java:61)
        at org.sonar.process.ProcessEntryPoint.waitForStatus(ProcessEntryPoint.java:121)
        at org.sonar.process.ProcessEntryPoint.launch(ProcessEntryPoint.java:104)
        at org.sonar.process.ProcessEntryPoint.launch(ProcessEntryPoint.java:81)
        at org.sonar.server.app.WebServer.main(WebServer.java:99)
Caused by: java.io.FileNotFoundException: /opt/sonarqube-8.4.0.35506/temp/sharedmemory (Too many open files in system)
        at java.base/java.io.RandomAccessFile.open0(Native Method)
        at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:345)
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:259)
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:214)
        at org.sonar.process.sharedmemoryfile.AllProcessesCommands.<init>(AllProcessesCommands.java:100)
        ... 8 common frames omitted
2020.07.23 16:38:17 INFO  web[][o.s.p.ProcessEntryPoint] Hard stopping process
2020.07.23 16:38:18 WARN  web[][o.s.p.ProcessEntryPoint$HardStopperThread] Can not stop in 1000ms
2020.07.23 16:38:18 WARN  web[][o.s.s.a.EmbeddedTomcat] Failed to stop web server
org.apache.catalina.LifecycleException: A child container failed during stop
        at org.apache.catalina.core.ContainerBase.stopInternal(ContainerBase.java:997)
        at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:257)
        at org.apache.catalina.core.StandardService.stopInternal(StandardService.java:486)
        at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:257)
        at org.apache.catalina.core.StandardServer.stopInternal(StandardServer.java:787)
        at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:257)
        at org.apache.catalina.startup.Tomcat.stop(Tomcat.java:450)
        at org.sonar.server.app.EmbeddedTomcat.terminate(EmbeddedTomcat.java:104)
        at org.sonar.server.app.WebServer.hardStop(WebServer.java:83)
        at org.sonar.process.ProcessEntryPoint$HardStopperThread.lambda$new$0(ProcessEntryPoint.java:219)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
2020.07.23 16:38:18 INFO  web[][o.s.s.p.d.m.s.MassUpdate] 1249 rows processed (0 items/sec)
2020.07.23 16:38:27 WARN  web[][o.s.s.p.d.m.DatabaseMigrationExecutorServiceImpl] Pool DatabaseMigrationExecutorServiceImpl did not terminate
2020.07.23 16:39:18 INFO  web[][o.s.s.p.d.m.s.MassUpdate] 1249 rows processed (0 items/sec)
2020.07.23 16:40:18 INFO  web[][o.s.s.p.d.m.s.MassUpdate] 1249 rows processed (0 items/sec)
2020.07.23 16:41:18 INFO  web[][o.s.s.p.d.m.s.MassUpdate] 1249 rows processed (0 items/sec)
...
2020.07.23 21:59:18 INFO  web[][o.s.s.p.d.m.s.MassUpdate] 1521 rows processed (0 items/sec)
2020.07.23 22:00:18 INFO  web[][o.s.s.p.d.m.s.MassUpdate] 1521 rows processed (0 items/sec)
2020.07.23 22:00:31 INFO  web[][o.s.s.p.d.m.s.MassUpdate] 1521 rows processed (0 items/sec)
2020.07.23 22:00:31 INFO  web[][DbMigrations] #3208 'Remove old Security Review Rating measures': success | time=24532914ms

Information about installation:

  • SonarQube running on r5.2xlarge EC2 instance
  • SonarQube DB running Postgres 11.6 on an m5.large instance in RDS

Configurations:

  • vm.max_map_count: 1048576
  • fs.file-max: 262144
  • nofile: 262144
  • nproc: 16384
  • sonar.ce.javaOpts=-Xmx20G
  • sonar.web.javaOpts=-Xmx28G
  • sonar.search.javaOpts=-Xmx12G -Xms12G

I did some monitoring of the queries that are running during DBMigration #3208. This query seems to be the killer:

delete from live_measures where project_uuid = $1 and metric_id in ($2, $3)

Our database has 208265000 rows in that table

Is there something we can do to speed up DBMigration #3208? 9 hours of downtime on our production instance is not feasible. Perhaps an index is missing that we could add? Maybe our configuration needs some tuning? Any advice would be great!

Hi, welcome to the community forum. We indeed did major changes in the database schema between 7.9 and 8.4. The biggest one is 8.4, with 300+ migrations. Given the size of your dataset, I’m not really surprised with the time taken, which is close to what we benchmarked. Unfortunately, there is not much I can think of that may help you here. What edition are you using?

Hi Pierre. Thank you for the response. We are using the Enterprise edition. Do you know if the DBMigration task #3208 is specific to 8.4?

#3208 is for 8.2. So you still have a long way to go