Duplicate Code Metric for Terraform Code missing

We are a DevOps Team doing a lot with IaC Terraform with a heavy engineering background in java/python.

We currently evaluate Sonarqube (still Community Version, on EKS with Helm) and are quite impressed.

I used to see duplication metrics for code, but except for scala I dont see any for our TF Code.

I played with sonar.cpd.terraform.minimumtokens and … …Lines. But from the log perspective I dont see any ‘DEBUG: Detection of duplications for …’ for terraform files.

Is it missing in implementation?

Thx in advance!
Lars

From what I read in GitHub - SonarSource/sonar-iac: Static Code Analyser for Infrastructure-as-Code languages such as CloudFormation and Terraform as well as DevOps like Docker and Kubernetes there is no implementation of CPD - right?

I compared to sonar-python/sonar-python-plugin/src/main/java/org/sonar/plugins/python/cpd/PythonCpdAnalyzer.java at master · SonarSource/sonar-python · GitHub, would be implementing A TerraformCPDAnalyzer the same way a good start?

Hi @Lars2

Welcome to the community, thanks for evaluating SonarQube and for your question!

So far we haven’t considered implementing Copy Paste Detection for Terraform. It can cause a lot of false positives that we would like to avoid.

Please provide us some examples where you expect copy-paste detection, so we can better understand your case and think about it again.

Best Regards
Marcin Stachniuk

hi @ Marcin_Stachniuk

thx for replying.

As example let me describe the context we work in to understand the problem.

We implement “datalakes” in AWS consisting of multiple data pipelines.
Each data pipeline

  • gets data from somewhere (S3/…)
  • mangle the data (Glue/Lambda/…)
  • places the results to somewhere (S3/RDS/…).

One can imagine when it comes to routine implementations one way to proceed is to copy and adapt (adjusting the terraform state, change a base name and implement the logic in a lambda). In an oversimplified example could it be the copy 10 files and change 3 lines and have another 300 lines duplicate code.

example file hierarchy

datapipelines:

  • dp1
    – /lambda
    — lambda.py
    – input_s3.tf
    – process_lambda.tf
    – output_s3.tf
    – provider.tf
    – versions.tf
  • dp2
    – /lambda
    — lambda.py
    – input_s3.tf
    – process_lambda.tf
    – output_s3.tf
    – provider.tf
    – versions.tf
  • dp3
    – /lambda
    — lambda.py
    – input_s3.tf
    – process_lambda.tf
    – output_s3.tf
    – provider.tf
    – versions.tf

Data duplication could give us a clue to consider refactoring and to keep DRY and improve in the field of clean code.

Without we only wake up and say ‘Oh nooo’.

Hope to made it clear.

Thx.

Hi @Lars2,

This seems like a nice improvement to our offering. I recorded it in our system and we will evaluate it for inclusion in our roadmap.

Denis

2 Likes