OK, I believe I have solved this issue.
I ran an ANALYZE
on the sonarDB
database but did not see any improvement.
I also enabled the postgres slow query log and during the analysis the only query that took over 5 seconds was this one, which would suggest postgres is not having to do large non-indexed joins:
2021-05-13 11:25:22.219 GMT [2365] LOG: duration: 5237.327 ms execute <unnamed>: SELECT
p.uuid,
p.path,
p.module_uuid as moduleUuid,
fs.src_hash as srcHash,
fs.revision
FROM components root
INNER JOIN components p on
p.project_uuid=root.uuid
and p.enabled=true
and p.scope='FIL'
INNER JOIN file_sources fs ON
fs.file_uuid=p.uuid
where
root.uuid=$1
2021-05-13 11:25:22.219 GMT [2365] DETAIL: parameters: $1 = 'AXkiaT-5v9lU_N2D-Wog'
I then used YourKit to analyse the compute engine process during an analysis and it produces the following flamegraph:
It’s clear from the staircase of functions on the left that “persisting live measures” was the majority of the runtime and it’s slowed down mostly by time spent in org.apache.ibatis.session.defaults.DefaultSqlSession.commit()
. There’s a chunk of time missing though that’s not visible here - hovering over the lines inside Yourkit showed only about 5 minutes of actual time spent in the CE code path.
I then went to look at the filesystem that postgres is using, an EFS volume, and it doesn’t seem terribly slow (it’s not great) but isn’t bad enough to cause such a slowdown, especially given the slow query log isn’t showing anything super slow:
postgresql@sonar-postgresql-0:/$ pg_test_fsync -f /tmp/file
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 1129.834 ops/sec 885 usecs/op
fdatasync 1241.245 ops/sec 806 usecs/op
fsync 786.327 ops/sec 1272 usecs/op
fsync_writethrough n/a
open_sync 994.000 ops/sec 1006 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 579.749 ops/sec 1725 usecs/op
fdatasync 984.110 ops/sec 1016 usecs/op
fsync 735.919 ops/sec 1359 usecs/op
fsync_writethrough n/a
open_sync 467.525 ops/sec 2139 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write 814.711 ops/sec 1227 usecs/op
2 * 8kB open_sync writes 421.977 ops/sec 2370 usecs/op
4 * 4kB open_sync writes 243.189 ops/sec 4112 usecs/op
8 * 2kB open_sync writes 124.718 ops/sec 8018 usecs/op
16 * 1kB open_sync writes 53.923 ops/sec 18545 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 953.955 ops/sec 1048 usecs/op
write, close, fsync 761.787 ops/sec 1313 usecs/op
Non-sync'ed 8kB writes:
write 183792.449 ops/sec 5 usecs/op
So this was a bit of a dead end (although some postgres tuning can likely get that commit time down some).
I then had a look at the events tab in YourKit and then I saw a large number of tiny writes of temp files to /opt/sonarqube/temp/ce
during zip file extraction. I have now mounted an in-memory filesystem there and the zip extraction step went from ~700,000ms to ~4000ms. The runtime of my entire report went from ~30 minutes to ~5.
Is this a safe change to keep moving forwards?
Thanks!