I’m quite new to SonarQube and currently exploring its features. As part of my evaluation, I created two simple Python scripts that share some duplicated code. SonarQube successfully detects duplicated lines and code blocks between these two files, which is great.
However, I noticed that it doesn’t detect similar duplications within the same file. Is this behavior intentional? And if not, is there a specific setting I need to adjust to enable duplication detection within a single Python script?
For testing, I am using the following command to run the analysis and view the results on my local SonarQube server (localhost): sonar-scanner.bat -D"sonar.projectKey=python-test" -D"sonar.sources=." -D"sonar.host.url=http://localhost:9000" -D"sonar.token=sqp_..."
Yes, SonarQube is capable of detecting code duplication within the same file. You can see an example of this in my SonarQube Cloud organization.
This functionality works out of the box and does not require any special configuration. However, SonarQube does provide several configuration parameters related to duplication detection that you can adjust if you want to fine-tune its behavior or results. You can find detailed information about these parameters in the documentation: Duplication Check Configuration.
Unfortunately, I’m unable to access the first link you shared due to restrictions behind my company firewall. However, I did go through the documentation you referred to regarding duplication detection.
Let me explain my use case from the beginning.
Here is an example Python file I’m analyzing:
import math, random, datetime, os # unnecessary imports
class BaseClass:
def greet(self):
print("Hello from BaseClass")
class SubClassA(BaseClass):
def greet(self):
# duplicate code within a function
print("Hello from SubClassA")
print("Hello from SubClassA")
# duplicate code across multiple classes
a = 5 + 5
b = a * 2
c = b - 3
d = b / 4
e = a * 4
f = e * e
g = a + b
h = 2 + e
i = f + g
print("A calc:", d)
class SubClassB(BaseClass):
def greet(self):
print("Hello from SubClassB")
print("Hello from SubClassB")
a = 5 + 5
b = a * 2
c = b - 3
d = b / 4
e = a * 4
f = e * e
g = a + b
h = 2 + e
i = f + g
print("B calc:", d)
class SubClassC(BaseClass):
def greet(self):
print("Hello from SubClassC")
print("Hello from SubClassC")
a = 5 + 5
b = a * 2
c = b - 3
d = b / 4
e = a * 4
f = e * e
g = a + b
h = 2 + e
i = f + g
a = 5 + 5
b = a * 2
c = b - 3
d = b / 4
e = a * 4
f = e * e
g = a + b
h = 2 + e
i = f + g
print("C calc:", d)
def overly_complicated_function(
a: int, b: str, c: float
): # only 'a' and 'b' are used
if a > 0:
if a < 10:
if a != 5:
if b == "test":
print("test passed")
else:
if b != "":
print("not empty")
else:
print("empty string")
else:
print("a is 5")
else:
print("a is too big")
else:
print("a is non-positive")
def uses_typing(x: int, y: str) -> str:
print(y)
return str(x)
def unused_function(z, w, u): # only z is used
return z + 1
def main() : # a lot of spaces
objA = SubClassA()
objB = SubClassB()
objC = SubClassC()
objA.greet()
objB.greet()
objC.greet()
result1 = overly_complicated_function(3, "hello", "not used") # c is unnecessary
result2 = uses_typing("not an int", 42) # type annotations ignored
print(result2)
math.sqrt(16) # calculated but not used
main()
My goals are:
Detect all code duplications (within functions, across classes, etc.).
Catch formatting issues according to PEP8.
Identify type inconsistencies using function type hints.
Report unused imports, variables, functions, and classes.
Detect unused statements like math.sqrt(16) when the result is unused.
Receive refactoring suggestions, e.g., extracting repeated logic into a method in the base class.
Currently, I’m using the following SonarScanner command:
Despite this, I’m not seeing these issues reflected in the SonarQube dashboard on localhost:9000. SonarQube tells me that there are no code duplications at all. So my questions are:
Can SonarQube detect all of the above-mentioned issues?
If yes, what additional configuration or setup is required?
I’d really appreciate some guidance on how to ensure these kinds of issues are detected and visualized properly in SonarQube.
I’ve reproduced your duplications behavior, both within-file and across files. It’s not clear to me what’s going on here, but I suspect it’s about how the algorithm works. Per the docs:
For a block of code to be considered as duplicated:
Non-Java projects:
There should be at least 100 successive and duplicated tokens.
Those tokens should be spread at least on:
…
10 lines of code for other languages
I’ve also reproduced your missing issues.
Normally, we try to keep it to one topic per thread. Otherwise it can get messy, fast. But I suspect there’s some underlying mechanism here, and finding it will solve most of this at once. So while I (we) reserve the right to ask you to create other topics, for now I’ll let it ride as is.
I wanted to check in and see if there have been any updates regarding the issues I raised—particularly the duplication behavior and the missing issues you were able to reproduce. Have you or the language experts been able to identify the root cause or determine whether this behavior is expected?
Additionally, I’m wondering if some of the issues might stem from the fact that I’m currently using the community version of SonarQube rather than a paid edition. If that’s a factor, it would be important for me to know.
I really need to analyze whether SonarQube is able to meet the previously mentioned goals I outlined in my earlier post, so any insight you can provide would be greatly appreciated.
For your code duplication issue, the duplicated block
a = 5 + 5
b = a * 2
c = b - 3
d = b / 4
e = a * 4
f = e * e
g = a + b
h = 2 + e
i = f + g
is made of around 40 tokens spread across 9 lines, which is below the threshold for detection.
We support some PEP8 formatting rules, however that has not been our focus, and you could import issues from other linters.
For the type inconsistencies, the rule S6555 triggers on your code on both "not_an_int" and 42, the second one is a secondary location. You can get more details on issues by clicking on them.
For the unused imports, our rule S1128 is not in the default Sonar Way quality profile, it needs to be included manually and only supports the from a import b syntax.
For the unused statements, we have rules such as S905 and S2201. In this case, you are right and I have created SONARPY-2949 to fix this false negative.
There is currently no feature for refactoring suggestions as complicated as this, but many of our rules do have quickfixes. We also have AI Codefix to help for more complicated fixes.
To follow up on the code duplication issue: is there a way to lower the threshold for detection so that smaller duplicated blocks like the one I mentioned (around 40 tokens over 9 lines) are also detected?
Despite this, I’m not seeing these issues reflected in the SonarQube dashboard on localhost:9000 . SonarQube tells me that there are no code duplications at all .
Thanks for your message — using "py" instead of "python" indeed solved the issue with detecting duplicated lines of code within the same file! It worked exactly as expected. Thanks a lot!
That said, I find it a bit confusing that "py" is used in this case, while the documentation elsewhere consistently uses "sonar.python...". For example:
Python code is analyzed by default as compatible with Python 2 and Python 3. Some issues will be automatically silenced to avoid raising False Positives. In order to get a more precise analysis, you can specify the Python versions your code supports via the sonar.python.version parameter.
The accepted format is a comma-separated list of versions having the format "X.Y". Here are some examples: sonar.python.version=2.7 sonar.python.version=3.8 sonar.python.version=2.7, 3.7, 3.8, 3.9
Absolutely, there are a few tricky cases here. This is some very old SonarQube logic (see SONAR-1501 from 2010)! The language key isn’t really used in other parts of SonarQube.
Since this is an advanced configuration that’s rarely used, it’s unlikely we’ll change how the analysis parameters work.
However, we could certainly improve our documentation to make it clearer where to find the language keys when they’re required. I’ll make sure to flag this for review—thanks for pointing it out!