When Quality Gates Punish Real Coverage Gains

I recently tried to meet an 80% new-code coverage quality gate on a legacy shared library with overall coverage of 43%. This repo has history back to at least 2012, and SonarQube gates have only been enforced for the last five years.

My plan was to use AI to generate unit tests and fix some Sonar issues in the process. When my PR went in, the build failed — I had changed 129 lines of code but only achieved 70% coverage on those new lines. I pushed it up to 75%, but some of the modified code was so hard to test that refactoring it would have been risky, given how old and widely used the code is and that it lacked unit tests.

The “solution” I ended up with was reverting one file to its original state (reintroducing the Sonar issues) to reduce the count of new lines. That bumped the percentage over 80%, but it felt wrong.

Meanwhile, my new tests increased overall coverage by 650 lines of code, far more than the 129 lines I touched. If the quality gate metric had been:

(new + old LOC covered) / (new LOC)

…my coverage would have been 504% and is a better reflection of the overall improvement of the code base. This kind of metric calculation could be an optional configuration for repos, rewarding meaningful coverage improvements without penalizing developers for touching hard-to-test code.

Just a thought for how to make the product even better.