How to use a custom Quality Gate on Pull Request analysis for more restricted threshold

Hi,

The PR analysis always uses the quality gate specified for the current Sonar project. However, the thresholds in the gate are defined for the larger volume of code (overall code or per-version, or specific duration), applying them on the PR level doesn’t really help to improve the code quality because the gate mostly reports Passed. For instance, our gate has a condition for Duplication is below 10% for New Code (per version), but in the PR scope, we want that condition should be 3%.

Is it true for the demand of having the quality gate on the PR scope? Or we’re using it wrongly (please help guide us in this case).

Duy

Hi @duy.lam,

I’m not entirely convinced that having a higher threshold for Duplication on New Code in your overall new code period, versus evaluating it per-PR, is helpful.

If you were to change your QG (Quality Gate) condition to 3% tomorrow, the project might immediately fail its current QG if the duplication in the present new code period exceeds that threshold. However, if you reset the New Code Period to start from now, and all future PRs meet the QG, then you’ll never fail on this condition again.

In other words, raising the threshold for the overall period doesn’t really add value, because you can effectively “reset” the period and only enforce the limit on each PR going forward. Does that make sense?

Hi Colin (sorry, I can’t mention your usename)

Thanks for your response

In other words, raising the threshold for the overall period doesn’t really add value, because you can effectively “reset” the period and only enforce the limit on each PR going forward. Does that make sense?

I’m not sure the “reset” is the right usage for GW. Per my understanding, the New Code is the code changes within particular “start” and “end” points (either by version or duration). That concept means the New Code (no matter what boundary type) is applied for a much longer scale than a single Pull Request. And it’s the right design IMO.

Therefore, I don’t know how it’s going to work when the boundary for the New Code is very short (like 05-days duration or using build-number for the app version). It sounds suitable for the PR level, but then the New Code is now equivalent to the PR, while the nature of the New Code is larger and longer than a single PR !!!

What the ideal situation here is that there is another type of QG (might be called as Instant QG to reflect the small scope effect), and then the scanner respects them when the scan context is PR. IMO it will address the right need:

  • For the actual new code (PR level), we have a real quality gate
  • For the current new code (per version or duration), it still reports the accurate quality, and the team can address them before the New Code ends

Is it a valid concern?

Duy

But the New Code is going to be the collection of PRs you’ve merged since the new code period started. Why would you want 10% to be the threshold for “all the PRs merged since x date” but 3% for the PR you’re reviewing?

Keep in mind the end of the New Code Period is always now (well, the latest analysis).

Colin

Why would you want 10% to be the threshold for “all the PRs merged since x date” but 3% for the PR you’re reviewing?

Our team’s operating model balances the need for consistent feature delivery with maintaining high code quality. Consequently, when the Quality Gate (QG) reports a check failure on an individual Pull Request (PR) (indicating at least one violated condition), and the resolution effort significantly impacts the progress (for instance, by delaying testing), we are comfortable deferring the fix.

We can prioritise addressing these issues in a subsequent PR, provided it remains within the designated ‘New Code’ period, which, per our settings, is defined by the application version. These “refactoring” PRs are reserved for exceptional cases where the required fix is either complex or substantial. IMO, this is a common scenario when adapting a tool like Sonar to an existing, mature project.

Given this operating model, setting the QG to a very small threshold at the PR level (e.g., 2% duplication) creates significant confusion among developers. It results in subsequent PRs reporting the same pre-existing failure, which is outside of their direct scope and responsibility.

As our development process involves many distributed teams, we look for a clear and simple quality model. This model could effectively differentiate ‘New Code’ defined at the application version level from ‘New Code’ defined at the PR level, allowing each team to autonomously control and manage their quality.

Since I am new to Sonar, please kindly advise if you notice any missteps or areas for improvement in our adoption process

Duy

Hi @duy.lam,

I’m a bit confused by your comments regarding the duplication thresholds. From what I gathered earlier, you want the PR scope to use a stricter threshold than the one set for new code on your branch. However, I noticed some conflicting statements:

  • In one comment, you mention that setting a very small threshold (e.g., 2%) at the PR level causes confusion among developers.
  • In another, you state that while the gate for duplication on new code is below 10% per version, you want the PR condition to be 3%.

Which is it?

Colin

Yes, it’s still correct. The problem we’re facing is: the Sonar scanner uses the same QG for New Code to assert the PR changes.

As explained in prior comments, we configure the setting for New Code as app version (like 9.7.7, 9.7.8, and so on) so that we can monitor the quality over versions. This also means the quality thresholds (code duplication, coverage, security rating, etc.) in the QG (for New Code) are much larger than an individual PR e.g. most of the PR will have a Passed result from the scanner.

You recommend adjusting the New Code setting to PR-level (like 3-day durations), and that confuses me on 02 main points:

  • It sounds different from the New Code condition when the “start” and “end” points (for the New Code period) should be much larger than the PR
  • The Failed result on the PR could not come from the PR change itself (as explained in the comment when we can merge the PR to meet the progress plan)

I think the main questions: is it correct when we use the app version for the New Code (which implies a set of PR, not a single one)? And then, given the New Code as an app version, is it correct that the QG’s thresholds should evaluate the code over that period (e.g. set of PR, not just one)

Initially, I assume the answers for 02 are Yes, and it confuses me when to use the QG (for app version changes) on the PR level.

Duy

Hi Duy,

I think there might be some confusion about how PR analysis works in SonarQube, which might help clarify the approach here.

You mentioned that subsequent PRs are “reporting the same pre-existing failure, which is outside of their direct scope and responsibility.” This shouldn’t actually happen with PR analysis—PRs only report issues on lines that were changed or added in that specific PR. If you’re seeing issues from previous PRs appearing in unrelated PRs that don’t touch that code, something might be misconfigured with your PR analysis setup.

Regarding the workflow you described: if a PR introduces 5% duplication but you plan to fix it in a subsequent “refactoring PR” within the same version period, that refactoring PR will still be analyzed against the same New Code period. SonarQube will see the cumulative duplication from both PRs until the fix lands. So the 10% threshold doesn’t really give you the flexibility you’re describing—the Quality Gate will still reflect the elevated duplication for the branch until it’s resolved.

If the goal is to occasionally allow PRs that exceed your quality standards with the understanding you’ll fix them soon after, that’s really just accepting that the Quality Gate will temporarily fail on the branch.

Could you clarify what you’re actually seeing in subsequent PRs, or an example of a PR that fails on changes introduced in another PR? That might help us understand if there’s a configuration issue or a different approach that would work better for your workflow.

Colin

My apologies for the confusion. When I saw the failed result (by the QG) on PRs (with more restricted thresholds), I thought they were from previous PRs, but actually, they had violated rules within that PR scope (the same issue, but in different code locations)

If the goal is to occasionally allow PRs that exceed your quality standards with the understanding you’ll fix them soon after, that’s really just accepting that the Quality Gate will temporarily fail on the branch

This is our use case. Through our conversation, I realized that the root cause of this thread is that the New Code is applied to the PR (in order to evaluate the QG) although the New Code is designed for a wider code change scope (app version or duration).

To summarize, because PR Scanner uses thresholds defined for the New Code, we need to have restricted threshold values (like 1% code duplication allowed) to enforce high quality on the PR (although the New Code “start” and “end” points may be wider)

Am I correct?

Duy

Hi Duy,

Yes, you’re correct. Your Quality Gate condition should be set to the standard you expect on PRs. If each PR passes its Quality Gate before merge, then the branch it merges into will also, naturally, pass the Quality Gate.

Your only problem will come on those occasions when your PR doesn’t meet the standard and you decide to merge it anyway (:warning:). Then your branch Quality Gate will likely* fail until you fix the underlying problem you merged.

 
HTH,
Ann

* Conditions based on percentages might fail in the PR but pass once merged based on how much code is “new” versus how big the failure was in the PR.

1 Like