DB filling up on table "file_sources"

  • SonarQube Enterprise 2025.5, zip deployed

file_sources is currently the biggest table on our SQ’s SQL server by far with 12+ GB and growing.

a quick check shows that it seems most of the information in it
supposedly orphaned entries not covered by the built in SQ maintenance tasks.

SELECT TOP 1000
fs.project_uuid,
fs.line_count,
DATEADD(SECOND, fs.created_at / 1000, CAST('1970-01-01 00:00:00' AS DATETIME2)) AS created_at_human,
DATEADD(SECOND, fs.updated_at / 1000, CAST('1970-01-01 00:00:00' AS DATETIME2)) AS updated_at_human
FROM file_sources fs
LEFT JOIN projects p
ON fs.project_uuid = p.uuid
WHERE p.uuid IS NULL
ORDER BY fs.line_count DESC;

Deleting them could be a bit risky without proper background undestanding of the usage.
Both dates shown are actually mostly not very old, so I wonder why those project UUIDs are not existing anymore.

Hi,

I think this is not unusual.

Uhm… yes.

I don’t have the schema in front of me, but I suspect some other FK is in play here.

How does the size of this table compare to the other tables in your DB?

Also how many project and how many branches and PRs do you have under analysis?

 
Ann

Thank you for your feedback!

this is not the end of the world, but schema changes fast, a year ago we were fighting the infamous live_measures problems with dependency-check ( Long loading times at project dashboard since upgrade from 9.9.6 to 2025.1 ) so we have an eye on the table sizes, especially the ones rising fast and obviously with unused content.

the only FK I can see is the project_UUID and file_UUID. for projects I checked the orphan status with the SQL query, I could not find a table that fits the the 2nd FK

MS SQL, check if other tables have foreign keys pointing to your table’s primary key, use
EXEC sp_fkeys @pktable_name = ‘file_sources’;
→ 0 (no pointers)

EXEC sp_help ‘file_sources’;
no hints I could see in schema etc.

we have ~ active projects and branches below 1000.

Hi,

Since we fixed the measures storage, as you alluded, I suspect that your distribution is not unusual. Still the orphans are a bit curious. I’m going to flag this for more expert eyes.

 
Ann

1 Like

Hello @Falco_Alexander,

Despite the confusing column name (our bad for that), file_sources.project_uuid actually points to the branch UUID, not the project UUID. You will need to join on the project_branches table instead of projects in your query to retrieve the actual orphaned entries.

If that returns few or no rows, your data is fine. The 12GB consists of legitimate source snapshots for active branches and PRs.

If you want to reduce growth, you can optimize the values under Administration → General Settings → Housekeeping → Branches and Pull Requests by lowering the sonar.dbcleaner.daysBeforeDeletingInactiveBranchesAndPRs setting and checking your configured exceptions for the sonar.dbcleaner.branchesToKeepWhenInactive setting.

If you have any other issues on this matter, feel free to reply to this topic :slight_smile:

1 Like

Hi @hatem.amairi

thanks a lot for your correction.
indeed, orphaned went down to zero.

Housekeeping is on default values and seem to work, but I wonder why:

SELECT COUNT(*) AS total_count
FROM (
SELECT DATEADD(SECOND, fs.updated_at / 1000, CAST('1970-01-01 00:00:00' AS DATETIME2)) AS updated_at_human,
fs.project_uuid,
pb.kee
FROM file_sources fs
JOIN project_branches pb
ON fs.project_uuid = pb.uuid
) subquery
WHERE subquery.kee NOT IN ('main', 'master', 'develop', 'trunk')
AND subquery.updated_at_human < DATEADD(DAY, -31, GETDATE());

is still at ~15% of the table. But must admit I’d not consider this as a big problem anymore.
thank you!

2 Likes

Hello @Falco_Alexander

I’m glad that everything is working as expected for your SQS instance :slight_smile:

Regarding your query, housekeeping does not rely on when files were last updated on a branch to decide whether a branch should be deleted.

Branch cleanup is based on branch inactivity, essentially if the branch’s last analysis is older than the configured threshold in daysBeforeDeletingInactiveBranchesAndPRs plus the branchesToKeepWhenInactive setting to protect specific branches (note that PRs are always cleaned up and main branches are always kept).

If you want to dig more into this you can you can check the snapshots table and look at the last analysis for the branch (islast = true). Also, to check if a branch is excluded from the housekeeping you can look into the project_branches table (exclude_from_purge, branch_type, is_main etc…) :wink:

Best regards,