Upgrade to SonarQube 8.4.2.36762 from Developer Edition Seems Broken

Dear SonarSource Team,

We have done a SonarQube upgrade last Thursday the 3rd.

We had a developer edition server version 8.4.0.35506 backed up by Postgres 9.x.x and we do the hosting on AWS.

We also have around 2.5 million lines of code.

We initiated the upgrade around 20:00 local time (19:00 UTC), the upgrade was completed, infrastructure wise, in about 1-hour time.

We moved to Enterprise edition version 8.4.2.36762 backed up by Postgres 11.1

Here’s the current server details:

SonarQube ID information

Server ID: AA8E3E29-AWTbwYFca4ftx1icEQfb

Version: 8.4.2.36762

After the completion of the upgrade, background tasks for reloading project data started. There was about 1300 tasks in total.

The progress was insanely slow and after some looking around, one of our team members suggested to give the database server more resources.

We upgraded it to a 4 CPU, 16 GBs RAM server and suddenly things started moving. Database utilization spiked immediately to 100%

After around six hours, the task progress got stock at about 60% and we noticed two project analysis were in progress for 5 hours and 3 hours, respectively.

We decided to restart from scratch and give it an even bigger database, we upgraded the server to 8 CPUs and 32GBs of RAM.

After the upgrade, something started hanging again and it didn’t move past the 0% point.

We restarted the application server again, and sure enough it started moving faster.

It managed to finish all background tasks within 4-hour time, but there was one important project which still displays as unavailable although all background tasks are complete.

One day after this, the database server utilization is still at 100% and the project still hasn’t completed.

During this while process, all log files were perfectly clean, and I have attached them to this e-mail.

On the database side, we see a single query taking all the CPU time:

SELECT p.kee, p.uuid FROM components p INNER JOIN components root ON root.uuid=p.project_uuid AND root.kee=$1

I don’t know if the behavior we are observing is normal, it looks like a bug. As the application server is idle during the weekend, yet the database is at 100%

Given the above information, especially the project that didn’t load (one of our most important ones). We might potentially need to roll back to a last well-known state.

That means a high chance of a new server ID and if things don’t work out with a Enterprise edition of the same version we had, it could potentially mean we have to rollback to Developer edition for now.

Any suggestions or solutions are welcome, this is really critical for us.

Thank you

sonarqube_app.log (8.2 KB) sonarqube_ce.log (2.5 KB) sonarqube_es.log (15.4 KB) sonarqube_web.log (154 Bytes)

Dear Team

We decided to rollback, but no completely. What I mean by that is, we restored to the exact same version we had but Enterprise edition instead of developer.

Everything went smoothly and all projects reloaded successfully. Right after that, the first analysis request came to the system and boom! 100% CPU utilization on the database and the system is giving 504 Gateway Timeouts here and there. The background task for the analysis just seems to be running with no end in sight.

I wanted to stress a few things that have changed from our previous environment:

  • Developer ======> Enterprise edition
  • Everything was on the same drive /dev/sda1 ===> data directory is on /dev/sdf (Second EBS volume)
  • We are running on Ubuntu 20.04
  • Postgres server was at 9.x ===========> and now we have 11.1

I cannot see why any of this should break the system given that all projects were reloaded successfully but I appreciate any insights.

Hi, what is the size of your database? Do you have other metrics available except CPU usage? (read/write iops, performance insights)

Hi Pierre,

I was trying to narrow down the problem to a specific change and all the indicators at the moment are pointing to the upgrade from Sonar 9.x to Postgres 11.1

After this observation my colleague, pointed to [RESOLVED] Sonarqube is slow since update to 8.4.1 and now we want to try that too, specifically the VACUUM FULL command.

Write IOPS was maximum at 600 and then it averaged 300
Read IOPS was averaging 200

1 Like

I ran the VACUUM FULL command on the sonarqube database then started the application server. It didn’t seem to help.

Next test, not changing anything about the database server and trying the enterprise version combined with database snapshot.

Hello again,

So it is 100% the switch from Postgres 9.6.11 to 11.1 that is causing the issue.
At this stage I need your guidance on what actions can be taken or logs/metrics that need be collected.

One final remark, we actually restore from an RDS snapshot and not using Postgres native tooling.

Hi, It could be useful to give a try with a fresh PostgreSQL 11 instance, where you restore a dump file. Once started with SonarQube, you want to monitor database read/write iops, database cpu usage, and top queries with aws performance insights.

We have restored production to 9.6.11 and unfortunately we don’t have time to experiment with Postgress 11.1

Thank you for the support and please feel free to close this if needed.