Issue upgrading 8.9 -> 9.9 (LTS to LTS)

I recently tried upgrading out sonarqube server from 8.9 LTS to 9.9 LTS (zip file deployment method). The server ran through its migrations and came online, but the quality gates tab of the web ui was unable to load and threw an empty String error:

I also tried running our pipeline workflows against the upgraded server, and hit similar issues with qualitygate-related APIs returning similar empty String errors. I enabled trace-level logging on the server and tried clicking onto the quality gates tab again a couple times, but I didn’t see any errors that were obviously related to the issue I’m describing here. I am seeing quite a few connection refused errors in sonar.log:

2023.04.16 16:50:27 TRACE app[][tracer] curl -iX GET 'http://127.0.0.1:9001/'
2023.04.16 16:50:27 DEBUG app[][o.e.c.RestClient] updated [[host=http://127.0.0.1:9001]] already in blacklist
2023.04.16 16:50:27 DEBUG app[][o.a.h.i.n.c.MainClientExec] [exchange: 8] start execution
2023.04.16 16:50:27 DEBUG app[][o.a.h.c.p.RequestAddCookies] CookieSpec selected: default
2023.04.16 16:50:27 DEBUG app[][j.io.serialization] Builtin factory: null -> new: null
2023.04.16 16:50:27 DEBUG app[][o.a.h.c.p.RequestAuthCache] Re-using cached 'basic' auth scheme for http://127.0.0.1:9001
2023.04.16 16:50:27 DEBUG app[][o.a.h.c.p.RequestAuthCache] No credentials for preemptive authentication
2023.04.16 16:50:27 DEBUG app[][o.a.h.i.n.c.InternalHttpAsyncClient] [exchange: 8] Request connection for {}->http://127.0.0.1:9001
2023.04.16 16:50:27 DEBUG app[][o.a.h.i.n.c.PoolingNHttpClientConnectionManager] Connection request: [route: {}->http://127.0.0.1:9001][total kept alive: 0; route allocated: 0 of 10; total allocated: 0 of 30]
2023.04.16 16:50:27 DEBUG app[][o.a.h.i.n.c.PoolingNHttpClientConnectionManager] Connection request failed
java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:946)
	at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174)
	at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148)
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351)
	at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)
	at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
	at java.base/java.lang.Thread.run(Thread.java:833)

But there aren’t any of these errors logged during the timeframe where I was trying to observe this quality gates error in the logs (I was testing and looking for logged events in this timeframe: from 2023.04.16 16:39:30 to 2023.04.16 16:40:00).

The only thing I can think of that is non-standard about our server setup is that we’re using the Qualinsight code smells plugin (v4.0.0). So, I tried rolling back to a pre-upgrade database snapshot and reverting the server setup to 8.9 LTS, and I verified that the plugin was loaded and things were behaving normally. Next, I turned off the sonar server, removed the plugin jar file from the extensions/plugins directory, and turned the server back on and verified that the plugin was no longer loaded. I then tried upgrading this setup to 9.9 LTS, but again hit the same issue with the quality gates tab not loading and related APIs returning an empty String error.

I’ve attached trimmed logs from the tested timeframe, I’ve omitted the ce.log file because it’s extremely large but if you think it would be helpful I can find a way to provide it.
access.log (7.4 KB)
es.log (219.5 KB)
sonar.log (11.7 KB)
web.log (869.8 KB)
Thanks in advance for any assistance or insight you’re able to provide.

1 Like

Hi,

You’ve set your server to TRACE logging, which isn’t generally necessary. It generally results in a very low signal to noise ratio.

What I do see, in your sonar.log is this:

2023.04.16 16:49:29 INFO app[][o.s.a.SchedulerImpl] Sonarqube has been requested to stop

I think that means the problem isn’t in SonarQube itself.

 
Ann

I believe that’s just from me shutting down the server after I validated that the upgrade path was still broken for our instance. I don’t believe it has anything to do with the issue I’m trying to resolve here, I was just trying to note that I had observed those errors in the logfile. I enabled trace level logging to try to figure out what the issue with the quality gates view/api was, after seeing nothing useful logged at the debug level.

Hi,

Can you reset your server log level to INFO, clean out your logs, try starting again, and send those full logs from the very beginning.

 
Ann

I spun up a staging server using an image of our current main server + a snapshot of its database (running sonarqube 8.9), ran through the upgrade to 9.9, and reproduced the issue with the quality gates tab. Here are all the log files:

access.log (515.7 KB)
ce.log (4.5 MB)
es.log (8.3 KB)
sonar.log (3.3 KB)
web.log (115.5 KB)

Hi,

Thanks for the new logs. I don’t see anything at all unusual in them. What shows up in your browser console when this happens?

 
Ann

This is what comes up in the console:

UpdateNotification.tsx:131 Uncaught (in promise) TypeError: Cannot read properties of undefined (reading '9')
    at hOe.isPatchUpdate (UpdateNotification.tsx:131:35)
    at hOe.fetchSystemUpgradeInformation (UpdateNotification.tsx:184:21)
isPatchUpdate	@	UpdateNotification.tsx:131
fetchSystemUpgradeInformation	@	UpdateNotification.tsx:184
await in fetchSystemUpgradeInformation (async)		
componentDidMount	@	UpdateNotification.tsx:83
GVt	@	react-dom.production.min.js:212
ZVt	@	react-dom.production.min.js:255
pi.unstable_runWithPriority	@	scheduler.production.min.js:19
i1	@	react-dom.production.min.js:122
pI	@	react-dom.production.min.js:248
Dde	@	react-dom.production.min.js:239
r1	@	react-dom.production.min.js:230
oY	@	react-dom.production.min.js:281
(anonymous)	@	react-dom.production.min.js:284
n6e	@	react-dom.production.min.js:240
gY	@	react-dom.production.min.js:284
Jy.render	@	react-dom.production.min.js:290
tUt	@	startReactApp.tsx:245
Ikr	@	index.ts:44
await in Ikr (async)		
(anonymous)	@	index.ts:30
(anonymous)	@	index.ts:57
request.ts:114 
 GET https://sonarqube-upgrade-test.es.8x8.com/api/qualitygates/list 400
submit	@	request.ts:114
Vb	@	request.ts:196
ke	@	request.ts:204
Jne	@	quality-gates.ts:38
fetchQualityGates	@	App.tsx:77
componentDidMount	@	App.tsx:59
GVt	@	react-dom.production.min.js:212
ZVt	@	react-dom.production.min.js:255
pi.unstable_runWithPriority	@	scheduler.production.min.js:19
i1	@	react-dom.production.min.js:122
pI	@	react-dom.production.min.js:248
Dde	@	react-dom.production.min.js:239
r1	@	react-dom.production.min.js:230
oY	@	react-dom.production.min.js:281
(anonymous)	@	react-dom.production.min.js:284
n6e	@	react-dom.production.min.js:240
gY	@	react-dom.production.min.js:284
Jy.render	@	react-dom.production.min.js:290
tUt	@	startReactApp.tsx:245
Ikr	@	index.ts:44
await in Ikr (async)		
(anonymous)	@	index.ts:30
(anonymous)	@	index.ts:57

And API requests to api/qualitygates/list return {"errors":[{"msg":"empty String"}]}%

Hi,

Thanks. I’ve flagged this for more expert attention.

 
Ann

OK, thanks for the update

Warm welcome @ckeleher ,

I am trying to reproduce the issue you are having in my environment.

If you have access to the database could you run these 2 queries and share the results here? They will not contain any confidential data, but if you think so, feel free to remove any confidential information from the results:

  1. select * from quality_gates

  2. select text_value from properties where ‘qualitygate.default’ = prop_key

The results will help with the investigation. Thank you in advance.

and hit similar issues with qualitygate-related APIs returning similar empty String errors

At the same time, could you please provide some concrete examples of other /api/qualitygates/* endpoints that also return HTTP Status code 400? This will help us to narrow down the issue.

Here’s the results of the first query:
sonarqube-qg.txt (164.8 KB)
Redacted the names of the quality gates since most are named after corresponding projects, just to be on the safe side.

The second query returned zero results.

I did some more digging and, at least among the GET qualitygate methods, it seems that list is actually the only one that’s failing, my mistake in assuming it was all of them. get_by_project, project_status, search, and show all worked as expected. I didn’t try any of the POST methods via curl, but I was able to navigate to the page of an individual quality gate via the corresponding project’s webpage, and it mostly loaded as expected and I was able to update conditions on it. The only part that didn’t load was the list of other QGs on the side of the page, and again I saw that empty String banner pop up at the top of the page. So it seems that only the list method is impacted, based on those findings.

1 Like

Sorry, forgot to append a semicolon at the end of the second query, it actually did have some output:

sonar=> select text_value from properties where 'qualitygate.default' = prop_key;
      text_value
----------------------
 AXJD2TGSPWUcWP28Tcjv
(1 row)

Hi Clayton,

Thank you very much for the additional information that you provided. There is one thing that worries me a lot, namely somehow you have duplicated value in the uuid column of quality_gates table. Because of that SonarQube can behave unpredictably. Normally this should never happen as there should be unique constraint on this column (primary key).

The duplicated uuid is your default quality gate: AXJD2TGSPWUcWP28Tcjv.

To find out more about the issue we should:

  1. Check if the duplication in the uuid column was already there on the 8.9 backup of your database
  2. Check if the unique constraint on the uuid column of the quality_gates table was there on the 8.9 backup
  3. Double check that this constraint doesn’t exist in your database after the upgrade to 9.9
  4. Are the values in name column the same for this duplicated uuid ? Does any of these values ring any bells to you? Does any of these values seem obsolete?
  5. What DB vendor are you using?

This could be a symptom that your database schema is not in a great shape after the upgrade and we need to do a little bit more digging to try to get closer to the answer ‘why’.

  1. I see the same duplication in our main instance which is currently running 8.9
  2. on the 8.9 instance:
sonar=> \d+ quality_gates;
                                                   Table "public.quality_gates"
   Column    |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
-------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 name        | character varying(100)      |           | not null |         | extended |             |              |
 created_at  | timestamp without time zone |           |          |         | plain    |             |              |
 updated_at  | timestamp without time zone |           |          |         | plain    |             |              |
 is_built_in | boolean                     |           | not null |         | plain    |             |              |
 uuid        | character varying(40)       |           | not null |         | extended |             |              |
Indexes:
    "pk_quality_gates" PRIMARY KEY, btree (uuid)
Access method: heap
  1. on the 9.9 instance:
sonar=> \d+ quality_gates;
                                                   Table "public.quality_gates"
   Column    |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
-------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 name        | character varying(100)      |           | not null |         | extended |             |              |
 created_at  | timestamp without time zone |           |          |         | plain    |             |              |
 updated_at  | timestamp without time zone |           |          |         | plain    |             |              |
 is_built_in | boolean                     |           | not null |         | plain    |             |              |
 uuid        | character varying(40)       |           | not null |         | extended |             |              |
Indexes:
    "pk_quality_gates" PRIMARY KEY, btree (uuid)
Access method: heap

I don’t see any unique constraint on either version. If there’s another command that would yield more useful info, please let me know.

  1. One of the entries w/ duplicate uuid is a QG for one of our projects, the other is named “Sonar way Updated”. I’d imagine that they are both user created but I can’t say for sure. If you think it would help, I could try to create another copy of our 8.9 instance, delete one or both of those QGs, and then run the update to 9.9 and see if that changes anything.

  2. We’re using postgresql, version 14.4 on aws rds. We recently updated from an older version, 10.x I think, I don’t remember the exact minor version. That database version update was done prior to the sonarqube version update attempt

1 Like

Just to see, I tried destroying one of the offending entries via the api and restarted the server (this being on our 9.9 staging instance), and unfortunately it’s still unable to list quality gates

Hi Clayton,

I am getting closer to finding out the root cause of the issue. I managed to reproduce the exact issue by modifying data in my database.

It seems to me for now that the issue is caused by empty (or less likely null) value in the quality_gate_conditions.value_error column in your database.

Could you:

  1. Share the results of the following query:
    select uuid, value_error from quality_gate_conditions where qgate_uuid='AXJD2TGRPWUcWP28Tcir' order by created_at asc
  2. Search the column quality_gate_conditions.value_error for any values that don’t look like numbers (including nulls or empty strings). You can achieve that in postgres in example by running:
    select uuid, qgate_uuid from quality_gate_conditions where value_error !~ '^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$';
sonar=> select uuid, value_error from quality_gate_conditions where qgate_uuid='AXJD2TGRPWUcWP28Tcir' order by created_at asc;
         uuid         | value_error
----------------------+-------------
 AXolbh7801QwETeDn-qh | 5
 AXolbh7801QwETeDn-qG | 75
 AXolbh7801QwETeDn-qs | 95
 AXolbh6x01QwETeDn-b2 |
 AXolbh8V01QwETeDn-2u | 0
 AYSJ23PBhPMgR6WhlGro | 95
(6 rows)
sonar=> select uuid, qgate_uuid from quality_gate_conditions where value_error !~ '^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$';
         uuid         |      qgate_uuid
----------------------+----------------------
 AXolbh6x01QwETeDn-b2 | AXJD2TGRPWUcWP28Tcir
(1 row)

Thanks for including the exact queries, I appreciate it. I modified the second query to include returning the value_error, and it looks like it has an empty value as you described:

sonar=> select uuid, qgate_uuid, value_error from quality_gate_conditions where value_error !~ '^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$';
         uuid         |      qgate_uuid      | value_error
----------------------+----------------------+-------------
 AXolbh6x01QwETeDn-b2 | AXJD2TGRPWUcWP28Tcir |
(1 row)

I ran those queries on both of our instances (current prod running 8.9, and staging running 9.9) and the results look identical across both databases.

It is hard to say, without knowing the whole history of your usage of SonarQube how this value over there got emptied. As a temporary fix you can put over there 0 and your quality profiles page should be unblocked.

If this is true, then we can work together to understand what the expected value should be there.

1 Like

Ok, looks like that was the problem, things seem to be working now on the 9.9 staging server! Thanks so much for your help Łukasz and Ann, this has been driving me crazy for a while. I have no clue what happened to cause that empty cell, but I might have to do some access review on our setup just in case. I’m not too worried about fixing the value, it looks like it belongs to a QG that was only ever used for testing (maybe some test managed to mangle the entry?). Anyway, thanks again for your help

3 Likes