I’m writing a custom class that follows the ML Pipelines API in PySpark. For an example of one of the provided classes see LogisticRegressionhere. You’ll see that the __init__() and setParams() methods have many type-hinted arguments with default values.
Unfortunately, when I write my own class that follows this pattern, SQ flags the keyword args that are re-used across __init__() and setParams() as duplicate code and fails the Quality Gate. I can use sonar.cpd.exclusions to exclude my entire class from duplicate code detection but I don’t really want to do this as it has proved useful in the past to flag areas that should be refactored.
So can the duplicate code detection algorithm for Python be configured to exclude keyword args for methods?
First of all, sorry for the late reply.
I see that indeed, in this case duplicate code detection really doesn’t make sense.
This is a tricky issue to handle, as the duplicate code detection component is shared across analyzers and does not make the distinction between many token types as of today (i.e I don’t think it’s possible to simply exclude keyword arguments from CPD detection, even though this would make a lot of sense in my opinion).
Still, I think this case should be fixed somehow and I created the following ticket to handle this.
Thanks very much for replying and for raising the ticket!
I’m not sure how best to handle it either. Perhaps extend sonar.cpd.exclusions to allow a regular expression for excluding certain code patterns that appear in specified files/matching file patterns. However it could be fiddly to apply this correctly to the intended areas of interest, such as only the keyword arguments of certain functions.
Ideally Python would allow me to define a list of common keyword arguments and then re-use it in multiple functions in the first place.