Line duplication detection in Kotlin file way too aggressive

Today we got one of our PR blocked by line duplication check, developer got to me with this problem so I logged in to check what’s wrong, this is what was detected

This duplication detection is so wrong in so many places that I’m surprised it was reported in the first place, order aside, differences:

  • start_date fields are not the same
  • end_date fields are not the same
  • title marked as duplicated while it is only marked in one block but not the other - even if it was marked in both places it still has different value
  • light_theme.icon_url different
  • dark_theme only text_color is the same
  • pt and PT not the same

In fact I counted only 3 lines that are the same, if you don’t count bracket lines. I’m not really sure why we got this warning in the first place, even considering partial line comparison the dates are totally different.

Can it be fixed in future?

Hey there.

Duplication Detection ignores string literals.

If you believe there’s no value in refactoring the code, you can ignore files from duplication detection by configuring the Analysis Scope > Scope of duplication detection

Refactoring as moving those JSONs to separate files, as much as I don’t disagree with that I still think this should be allowed since Kotlin encourages/allows you to use this syntax, it’s in official docs/tutorials.

Duplication Detection ignores string literals.

If it ignores literals then why does it compare them, duplication section is not even over the whole string here

Thanks for the feedback. I’m not a Kotlin expert, so I’ll call some in (we have some great ones :smiley:)

1 Like

Hey there Iwo, I need some more information to replicate the issue, could you tell me which rule is reporting the issue here? Also could you provide a code snippet (text instead of image) of an example that reproduces your issue?

Thank you very much!

Code snippet below, current case,

Reported as
13.5% Duplication

In Sonar I see

Duplications:
Density 55.7%
Duplicated Lines 39
Duplicated Blocks 2

object RemoteConfigDefaults {
    @JvmField
    val METERING_REWARDED_VIDEO_CONFIG: String = """
    {
        "us": {
            "logged_user": 
            {
                "is_enabled":false, 
                "internal_ad_unit_id": "/2165551/brainly_android_app/RewardedadUnit_house_ads",
                "rewarded_videos_threshold": 5
            },
            "unlogged_user": 
            {
                "is_enabled":false, 
                "internal_ad_unit_id": "/2165551/brainly_android_app/RewardedadUnit_house_ads",
                "rewarded_videos_threshold": 3
            }
        }
    }
    """.trimIndent()
    @JvmField
    val METERING_BASE_CONFIG: String = """
    {
        "us":
        {
            "logged_user":
            {
                "free_questions": 13,
                "reset_metering_after_in_hours": 168,
                "basic_banner":
                {
                    "is_enabled": "true",
                    "visible_after_visit": 0
                },
                "counter_banner":
                {
                    "is_enabled": "true",
                    "visible_after_visit": 8
                }
            },
            "unlogged_user":
            {
                "free_questions": 4,
                "reset_metering_after_in_hours": 168,
                "basic_banner":
                {
                    "is_enabled": "true",
                    "visible_after_visit": 0
                },
                "counter_banner":
                {
                    "is_enabled": "false",
                    "visible_after_visit": 3
                }
            },
            "posted_answers_award_threshold": 25
        },
        "pl":
        {
            "logged_user":
            {
                "free_questions": 15,
                "reset_metering_after_in_hours": 168,
                "basic_banner":
                {
                    "is_enabled": "true",
                    "visible_after_visit": 0
                },
                "counter_banner":
                {
                    "is_enabled": "true",
                    "visible_after_visit": 10
                }
            },
            "unlogged_user":
            {
                "free_questions": 15,
                "reset_metering_after_in_hours": 168,
                "basic_banner":
                {
                    "is_enabled": "true",
                    "visible_after_visit": 0
                },
                "counter_banner":
                {
                    "is_enabled": "true",
                    "visible_after_visit": 10
                }
            },
            "posted_answers_award_threshold": 10
        },
        "pt":
        {
            "logged_user":
            {
                "free_questions": 13,
                "reset_metering_after_in_hours": 168,
                "basic_banner":
                {
                    "is_enabled": "true",
                    "visible_after_visit": 0
                },
                "counter_banner":
                {
                    "is_enabled": "true",
                    "visible_after_visit": 8
                }
            },
            "unlogged_user":
            {
                "free_questions": 13,
                "reset_metering_after_in_hours": 168,
                "basic_banner":
                {
                    "is_enabled": "true",
                    "visible_after_visit": 0
                },
                "counter_banner":
                {
                    "is_enabled": "true",
                    "visible_after_visit": 8
                }
            },
            "posted_answers_award_threshold": 5
        }
    }
    """.trimIndent()
    @JvmField
    val APP_ONBOARDING_CONFIG: String = """
    {
        "us": {
            "is_enabled": false,
             "steps": [
                "ginny",
                "scan_to_solve",
                "textbooks",
                "community",
                "tutoring"
            ],
            "show_offer_page_on_close": true
        }
    }
    """.trimIndent()
    @JvmField
    val TUTORING_ONBOARDING_CONFIG: String = """
    {
        "us": {
            "onboarding_version": "A"
        }
    }
    """.trimIndent()
    @JvmField
    val TUTORING_NEW_SUBJECTS_CONFIG: String = """
    {
        "us": {
            "new_subject_ids": "8,18"
        }
    }
    """.trimIndent()
    @JvmField
    val PROMO_CAMPAIGNS_OFFER_PAGE_CONFIG: String = """
    {
        "pt":{
            "start_date":"2022-11-04T07:00:00-03:00",
            "end_date":"2022-11-21T22:59:59-03:00",
            "light_theme":{
                "background_color":"#0C114D",
                "tint_color":"#FFFFFF",
                "icon_url":"https://s3.eu-west-1.amazonaws.com/mobile-static.z-dn.net/brainly-week-2022-promo-banner-gift-icon.png"
            },
            "dark_theme":{
                "background_color":"#0C114D",
                "tint_color":"#FFFFFF",
                "icon_url":"https://s3.eu-west-1.amazonaws.com/mobile-static.z-dn.net/brainly-week-2022-promo-banner-gift-icon.png"
            },
            "title":"Get up to 72% off annual plans for a limited time!",
            "subtitle":"Your discount will be applied automatically.",
            "campaign_id":"EOTY_2022",
            "hides_other_savings_text":true
        }
    }
    """.trimIndent()
    @JvmField
    val PROMO_CAMPAIGNS_PROFILE_CONFIG: String = """
    {
        "pt": {
            "start_date": "2022-10-24T06:00:00-04:00",
            "end_date": "2022-11-28T22:00:00-04:00",
            "title": "LIMITED TIME OFFER: Get up to 72% off annual plans w/ unlimited access!",
            "light_theme":{
                "background_color": "#0C114D",
                "tint_color": "#FFFFFF",
                "icon_url": "https://s3.eu-west-1.amazonaws.com/mobile-static.z-dn.net/brainly-week-2022-promo-banner-gift-small-icon.png"
            },
            "dark_theme":{
                "background_color": "#163BF3",
                "tint_color": "#FFFFFF",
                "icon_url": "https://s3.eu-west-1.amazonaws.com/mobile-static.z-dn.net/brainly-week-2022-promo-banner-gift-small-icon.png"
            },
            "campaign_id": "BLACK_FRIDAY2022"
        }
    }
    """.trimIndent()
    @JvmField
    val BRAINLY_PLUS_FREE_TRIAL_OFFER_PAGE_CONFIG: String = """
    {
         "us": {
            "is_enabled": true,
             "benefits": [
                "verified_answers",
                "math_solver",
                "textbooks",
                "no_interruptions"
            ]
        }
    }
    """.trimIndent()
    @JvmField
    val METERING_CONFIG_BASE: String = """
    {
        "logged_user": {
            "free_questions":13,
            "reset_metering_after_in_hours":168,
            "steps": {
                "0": {
                    "type":"counter_banner"
                },
                "1":{
                    "type":"counter_banner"
                },
                "2":{
                    "type":"counter_banner"
                },
                "3": {
                    "type":"counter_banner"
                },
                "4":{
                    "type":"counter_banner"
                },
                "5":{
                    "type":"counter_banner"
                },
                "6": {
                    "type":"counter_banner"
                },
                "7":{
                    "type":"counter_banner"
                },
                "8":{
                    "type":"counter_banner"
                },
                "9": {
                    "type":"counter_banner"
                },
                "10":{
                    "type":"counter_banner"
                },
                "11":{
                    "type":"counter_banner"
                },
                "12":{
                    "type":"counter_banner"
                }
            }
        },
        "unlogged_user": {
            "free_questions":4,
            "reset_metering_after_in_hours":168,
            "steps": {
                "0": {
                    "type":"basic_banner"
                },
                "1":{
                    "type":"basic_banner"
                },
                "2": {
                    "type":"basic_banner"
                },
                "3":{
                    "type":"basic_banner"
                }
            }
        },
        "posted_answers_award_threshold":5
    }
    """.trimIndent()
}

Any chance to take a look at this?

Hey Iwo, I tried to reproduce your issue on the snippet of code you provided but I had no luck.
Can I ask you to provide detailed information on your environment? Which version of SonarQube are you using? Also, can you confirm the rule that is raising the issue is S1192?

1 Like

Having the same problem, I also got 13.5% duplication for a file where basically no line is a duplicate, I even extracted most of common string within the lines in constants. It’s completly broken.

Hi @Iwo_Polanski,

We deeply apologize for delay. We could finally take the resources to properly investigate on this issue and get to the bottom of it.

TLDR;
You are right, there is something not properly working in Copy Paste Detection, specifically on long """ string templates in Kotlin.

We are going to fix it right away, and release in the next version of the Sonar Kotlin analyzer.
After that, depending on the product you are using, you may get the fix in days (SonarQube Cloud), or in the following release of SonarQube Server.
We will inform you via this ticket when the issue is fixed in the analyzer.

Notice that a very long list of strings may still trigger duplication: that is the way Copy Paste Detection works. In your scenario, however, and in many others you may encounter, the problem should be fixed for good.

Technical details
Hereafter some technical details, which may help you and other understand the current behavior of the duplication detection.

As @Colin mentioned earlier, Copy Paste Detection (CPD for short) ignores string literals, and that’s true in most if not all languages where we implemented it: it replace the actual string content with a placeholder. In the case of Kotlin the placeholder is LITERAL.

String templates, in languages that support them, have a complex structure, since expressions can be injected into them: e.g. """a $x b""".

In Kotlin, a single string template is made of many literal string template entries, which are fragments of the overall string template.

For example, if we take the first property in the RemoteConfigDefaults object you provided:

val METERING_REWARDED_VIDEO_CONFIG: String = """
    {
        "us": {
            "logged_user": 
            {
                "is_enabled":false, 
                "internal_ad_unit_id": "/2165551/brainly_android_app/RewardedadUnit_house_ads",
                "rewarded_videos_threshold": 5
            },
            "unlogged_user": 
            {
                "is_enabled":false, 
                "internal_ad_unit_id": "/2165551/brainly_android_app/RewardedadUnit_house_ads",
                "rewarded_videos_threshold": 3
            }
        }
    }
    """.trimIndent()

the string template """ ... """ translate into a very long series of literal string template entries:

PsiElement(OPEN_QUOTE): """
LITERAL_STRING_TEMPLATE_ENTRY x78
...
PsiElement(CLOSING_QUOTE): """

For the METERING_BASE_CONFIG string template there are 498 LITERAL_STRING_TEMPLATE_ENTRY entries!

When implementing the tokenization of string templates used for CPD, we emitted a LITERAL placeholder token for each entry, instead of emitting them for the entire string template, as we assumed that entries would only be created when expressions were injected in the template (e.g. for $x """a $x b""").

However, that’s not the case, and 78 entries are created for the string template above, even though there is no single expression injection.

The detection basically compares sequences tokens, and when there are at least 100 equal tokens over at least 10 lines (docs here), a duplication is detected.

This also explains why you observed a duplication only covering part of the string template: the tokenized string looks like the following:

object RemoteConfigDefaults { 
@ JvmField val METERING_REWARDED_VIDEO_CONFIG : String =
""" LITERAL x78 times ... """ . trimIndent ( ) 
@ JvmField val METERING_BASE_CONFIG : String = 
""" LITERAL x495 times """ . trimIndent ( ) 
@ JvmField val APP_ONBOARDING_CONFIG : String = """
...

So the detection would find any sequence of more than 100 LITERAL over 10+ lines and report as detection.

The possible solutions we are thinking about are:

  1. either reporting a single LITERAL token at string template level, and ignore injected expression altogether
  2. or trying to find a better compromise, to keep entries when they are about injected variables, and not when they are not relevant for the CPD

Notice that, even going for the most conservative approach (1), you could still have duplications when there is a long series of LITERAL, potentially coming from strings with different content. The typical example is an array of more than 100 strings, over more than 10 lines:

val strings = listOf( 
  "line 1 string 1", "line 1 string 2", ... "line 1 string 10",
  ...
  "line 10 string 1", "line 10 string 2", ... "line 10 string 10",
)

that would generate:

"listOf", ",", ("LITERAL", ",") x 100, ")"

These scenarios, however, are much more unlikely.

Best regards,
Antonio

2 Likes

Hi @rupert-jung-mw,

I suspect you have the same issue as @Iwo_Polanski, with long multiline string templates. The fix we are working on should fix your problem too.

Hope it helps,
Antonio