I’m quite new to SonarQube and currently exploring its features. As part of my evaluation, I created two simple Python scripts that share some duplicated code. SonarQube successfully detects duplicated lines and code blocks between these two files, which is great.
However, I noticed that it doesn’t detect similar duplications within the same file. Is this behavior intentional? And if not, is there a specific setting I need to adjust to enable duplication detection within a single Python script?
For testing, I am using the following command to run the analysis and view the results on my local SonarQube server (localhost): sonar-scanner.bat -D"sonar.projectKey=python-test" -D"sonar.sources=." -D"sonar.host.url=http://localhost:9000" -D"sonar.token=sqp_..."
Yes, SonarQube is capable of detecting code duplication within the same file. You can see an example of this in my SonarQube Cloud organization.
This functionality works out of the box and does not require any special configuration. However, SonarQube does provide several configuration parameters related to duplication detection that you can adjust if you want to fine-tune its behavior or results. You can find detailed information about these parameters in the documentation: Duplication Check Configuration.
Unfortunately, I’m unable to access the first link you shared due to restrictions behind my company firewall. However, I did go through the documentation you referred to regarding duplication detection.
Let me explain my use case from the beginning.
Here is an example Python file I’m analyzing:
import math, random, datetime, os # unnecessary imports
class BaseClass:
def greet(self):
print("Hello from BaseClass")
class SubClassA(BaseClass):
def greet(self):
# duplicate code within a function
print("Hello from SubClassA")
print("Hello from SubClassA")
# duplicate code across multiple classes
a = 5 + 5
b = a * 2
c = b - 3
d = b / 4
e = a * 4
f = e * e
g = a + b
h = 2 + e
i = f + g
print("A calc:", d)
class SubClassB(BaseClass):
def greet(self):
print("Hello from SubClassB")
print("Hello from SubClassB")
a = 5 + 5
b = a * 2
c = b - 3
d = b / 4
e = a * 4
f = e * e
g = a + b
h = 2 + e
i = f + g
print("B calc:", d)
class SubClassC(BaseClass):
def greet(self):
print("Hello from SubClassC")
print("Hello from SubClassC")
a = 5 + 5
b = a * 2
c = b - 3
d = b / 4
e = a * 4
f = e * e
g = a + b
h = 2 + e
i = f + g
a = 5 + 5
b = a * 2
c = b - 3
d = b / 4
e = a * 4
f = e * e
g = a + b
h = 2 + e
i = f + g
print("C calc:", d)
def overly_complicated_function(
a: int, b: str, c: float
): # only 'a' and 'b' are used
if a > 0:
if a < 10:
if a != 5:
if b == "test":
print("test passed")
else:
if b != "":
print("not empty")
else:
print("empty string")
else:
print("a is 5")
else:
print("a is too big")
else:
print("a is non-positive")
def uses_typing(x: int, y: str) -> str:
print(y)
return str(x)
def unused_function(z, w, u): # only z is used
return z + 1
def main() : # a lot of spaces
objA = SubClassA()
objB = SubClassB()
objC = SubClassC()
objA.greet()
objB.greet()
objC.greet()
result1 = overly_complicated_function(3, "hello", "not used") # c is unnecessary
result2 = uses_typing("not an int", 42) # type annotations ignored
print(result2)
math.sqrt(16) # calculated but not used
main()
My goals are:
Detect all code duplications (within functions, across classes, etc.).
Catch formatting issues according to PEP8.
Identify type inconsistencies using function type hints.
Report unused imports, variables, functions, and classes.
Detect unused statements like math.sqrt(16) when the result is unused.
Receive refactoring suggestions, e.g., extracting repeated logic into a method in the base class.
Currently, I’m using the following SonarScanner command:
Despite this, I’m not seeing these issues reflected in the SonarQube dashboard on localhost:9000. SonarQube tells me that there are no code duplications at all. So my questions are:
Can SonarQube detect all of the above-mentioned issues?
If yes, what additional configuration or setup is required?
I’d really appreciate some guidance on how to ensure these kinds of issues are detected and visualized properly in SonarQube.
I’ve reproduced your duplications behavior, both within-file and across files. It’s not clear to me what’s going on here, but I suspect it’s about how the algorithm works. Per the docs:
For a block of code to be considered as duplicated:
Non-Java projects:
There should be at least 100 successive and duplicated tokens.
Those tokens should be spread at least on:
…
10 lines of code for other languages
I’ve also reproduced your missing issues.
Normally, we try to keep it to one topic per thread. Otherwise it can get messy, fast. But I suspect there’s some underlying mechanism here, and finding it will solve most of this at once. So while I (we) reserve the right to ask you to create other topics, for now I’ll let it ride as is.