Diagnose performance difference between two environments

stumt · July 26, 2024, 8:28pm

Must-share information (formatted with Markdown):

which versions are you using (SonarQube, Scanner, Plugin, and any relevant extension)
SonarQube v10.6 (92116)
how is SonarQube deployed: zip, Docker, Helm
Helm
what are you trying to achieve
Get our production instance to match the performance of our staging instance
what have you tried so far to achieve this

Do not share screenshots of logs – share the text itself (bonus points for being well-formatted)!

We’re running SonarQube EE and have a staging instance instance for testing and a production instance. Both are hosted using Helm in Kubernetes clusters in Azure.

Staging info:

Azure Database for PostgreSQL flexible servers, Standard_B1m size
Node pool size Standard_D16ads_v5 using ephemeral disk
Node pool is tainted such that the only workload schedulable on it is SonarQube
Persistent volume via Azure storage, not sure on size but max 20k IOPS
Storage, Database and SonarQube in same Azure subscription
Helm chart resource requests/limits set to 16Gi and 8 cores

Production info:

Azure Database for PostgreSQL flexible servers, Standard_D4ds_v4 size
Node pool size Standard_D32ads_v5 using ephemeral disk
Node pool is tainted such that the only workload schedulable on it is SonarQube
Persistent volume via Azure storage, not sure on size but max 20k IOPS
Storage and SonarQube in same Azure subscription, Database in different Azure subscription (will be moved but haven’t been able to do so yet)
Helm chart resource requests/limits set to 16Gi and 8 cores

When analyzing a mixed C and C++ project of around 209k lines of code, the staging instance is completing the background task in ~9.5 minutes, but the production instance is taking ~18-20 minutes.

I’ve attached extracts from the ce.log files on both staging and production, and also put them into a chart to highlight the differences (filtered to events taking longer than 10 seconds).
staging_ce.log (11.7 KB)
prod_ce.log (11.9 KB)

event	staging ms	production ms
Extract report	358343	483608
Build tree of components	9633	12876
Load file hashes and statuses	9733	12191
Compute size measures	9864	11383
Compute new coverage	13488	16732
Execute component visitors	50078	449638
Persist live measures	4269	103235
Persist duplication data	379	10110
Persist sources	34450	43079
Time between final event and Executed task	67439	79821

Colin · July 29, 2024, 7:26am

Hey there.

Ultimately, it’s hard to do a direct comparison since you have some variables (sizing, being located on the same Azure subscription).

However, my first recommendation for any performance issues on a Postgres database would be this:

If that doesn’t help – I would like to know more about the dataset on staging. Is it a clone of production as it is today? Something else?

stumt · July 29, 2024, 3:28pm

Hi Colin,

However, my first recommendation for any performance issues on a Postgres database would be this:

Autovacuum was on but I tried running that anyway; didn’t make a difference.

Ultimately, it’s hard to do a direct comparison since you have some variables (sizing, being located on the same Azure subscription).

You’re right that the sizes of the Azure resources are different, however the production ones are basically “and then some” vs the staging ones, e.g. more cores and/or more RAM, higher throughput limits etc. I do not have a good understanding of what the impact to performance can be if any for the production instance being in a different subscription than the production database.

If that doesn’t help – I would like to know more about the dataset on staging. Is it a clone of production as it is today? Something else?

The staging database is a clone of production. The number of analyses run on the project will have diverged by now as I’ve been testing back and forth between staging and production - we’ve tried playing around with the resources given to the SonarQube pod - more cores more RAM etc., but none of that has so far made any difference.

Thanks,
Sean

stumt · July 30, 2024, 12:13am

Just a follow up… more testing today strongly suggests that the issue is network related. The Azure vnet configuration is different in the staging environment. I ran an analysis this morning before switching anything around, and the background analysis took ~22 minutes. I shut down both production and staging instances, pointed the staging instance at the production database, purged the staging elasticsearch index (AFAIK this is required when swapping databases?), started the staging instance back up, and repeated the test. This time it took ~11 minutes.

@Colin does that make sense to you? If the network connection to the database is poor (unsure if poor throughput or poor latency), would you expect to see issues like this?

Colin · July 30, 2024, 5:57am

Hey there.

Yes, definitely! As documented…

Hosts and locations

For optimal performance, the SonarQube server and database should be installed on separate hosts, and the server host should be dedicated. The server and database hosts should be located on the same network.

stumt · July 31, 2024, 7:05pm

Just an update and this will conclude this issue for us… this really does all revolve around disk and network (latency and/or throughput). Things we were doing that are the culprits:

Database in different network
(helm) Using Azure Fileshare for persistent storage
(VM) Slow disks for temporary storage and elasticsearch storage

Going forward, we are going to switch over to using a dedicated VM with fast storage. Azure ephemeral disks are good for this, and with a VM we don’t have to chase probe timeouts over time as our database grows like with a helm deployment.

system · August 7, 2024, 7:06pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
SonarCloud performance issues? SonarQube Cloud sonarqube-cloud	10	2650	November 15, 2018
Publishing results takes 10times longer on SonarQube SonarQube Server / Community Build azuredevops-server	0	785	July 19, 2021
Sonarqube publish quality gates results to azure devops very slow SonarQube Server / Community Build azure-devops	16	2133	January 26, 2024
SonarQube analysis performance difference in MsSQL and postgreSQL SonarQube Server / Community Build java , sonarqube , scanner	1	432	August 24, 2022
Background task time different with same project but different instances SonarQube Server / Community Build java	5	154	May 29, 2024

Diagnose performance difference between two environments

Hosts and locations

Related topics