How does SonarCloud compute the code duplications?

Hi,

After the recent update in sonarcloud, we’ve noticed something unusual with the code analysis for some of our projects. It was failing due to an unusual % of code duplication witch was included in our Quality Gates. The branch was OK/Passed the day before and we tested it again today and it went up to 4% with just the README file being changed. Is this a bug?

Also, where does the “since 28 days ago” came from if the recent analysis was yesterday?

Hey there.

SonarCloud’s calculation of duplication is fairly… set in its ways (a kind way of saying old). I wouldn’t expect that something has changed there. Just to get a little more information – what are the primary languages being used in your project?

It’s more likely that something is happening with your New Code Period.

If you have a sliding New Code Period (which here you have, of 28 days), the definition of what lines fall under “New Code” will change as some code “ages out” of that window. This may be a little more clear if you check the Activity section of your project.

Say, for example, you had very low code duplication on some code that fell in the New Code Period, as that code ages out it will no longer balance newer code with greater duplication.

I would suggest that you:

  • Make sure your New Code Period is defined in a way that matches your expectations. A rolling window doesn’t work for everyone.
  • Check the 8.2k New Lines that the dashboard says are being considered for duplication (as best you can) and make sure that no lines have fallen into the New Code Period that shouldn’t have. A good place to start would be to make sure the files that have changed look as expected, and there’s no large number of New Lines reported on files that aren’t new.
  • Check the duplication being reported. Is it valid, and only being reported on lines introduced in the last 28 days?

Thanks for the reply! As for the code, these are swift and kotlin. They devs usually dumps a sprint’s worth of code before release in our staging environment but they’re constantly using lower environments for code analysis. Does that affect the code duplication or is that regardless?

We’ll check your suggestions and also relay the information that you gave to my lead.