Duplicate Code Metric for Terraform Code missing

We are a DevOps Team doing a lot with IaC Terraform with a heavy engineering background in java/python.

We currently evaluate Sonarqube (still Community Version, on EKS with Helm) and are quite impressed.

I used to see duplication metrics for code, but except for scala I dont see any for our TF Code.

I played with sonar.cpd.terraform.minimumtokens and … …Lines. But from the log perspective I dont see any ‘DEBUG: Detection of duplications for …’ for terraform files.

Is it missing in implementation?

Thx in advance!
Lars

From what I read in GitHub - SonarSource/sonar-iac: Static Code Analyser for Infrastructure-as-Code languages such as CloudFormation and Terraform as well as DevOps like Docker and Kubernetes there is no implementation of CPD - right?

I compared to sonar-python/sonar-python-plugin/src/main/java/org/sonar/plugins/python/cpd/PythonCpdAnalyzer.java at master · SonarSource/sonar-python · GitHub, would be implementing A TerraformCPDAnalyzer the same way a good start?

Hi @Lars2

Welcome to the community, thanks for evaluating SonarQube and for your question!

So far we haven’t considered implementing Copy Paste Detection for Terraform. It can cause a lot of false positives that we would like to avoid.

Please provide us some examples where you expect copy-paste detection, so we can better understand your case and think about it again.

Best Regards
Marcin Stachniuk

1 Like

hi @ Marcin_Stachniuk

thx for replying.

As example let me describe the context we work in to understand the problem.

We implement “datalakes” in AWS consisting of multiple data pipelines.
Each data pipeline

  • gets data from somewhere (S3/…)
  • mangle the data (Glue/Lambda/…)
  • places the results to somewhere (S3/RDS/…).

One can imagine when it comes to routine implementations one way to proceed is to copy and adapt (adjusting the terraform state, change a base name and implement the logic in a lambda). In an oversimplified example could it be the copy 10 files and change 3 lines and have another 300 lines duplicate code.

example file hierarchy

datapipelines:

  • dp1
    – /lambda
    — lambda.py
    – input_s3.tf
    – process_lambda.tf
    – output_s3.tf
    – provider.tf
    – versions.tf
  • dp2
    – /lambda
    — lambda.py
    – input_s3.tf
    – process_lambda.tf
    – output_s3.tf
    – provider.tf
    – versions.tf
  • dp3
    – /lambda
    — lambda.py
    – input_s3.tf
    – process_lambda.tf
    – output_s3.tf
    – provider.tf
    – versions.tf

Data duplication could give us a clue to consider refactoring and to keep DRY and improve in the field of clean code.

Without we only wake up and say ‘Oh nooo’.

Hope to made it clear.

Thx.

1 Like

Hi @Lars2,

This seems like a nice improvement to our offering. I recorded it in our system and we will evaluate it for inclusion in our roadmap.

Denis

3 Likes

Hi,
I have just started evaluating the use of sonarqube for detecting duplicate terraform code and am disappointed to discover that this is currently not supported.

Please provide us some examples where you expect copy-paste detection, so we can better understand your case and think about it again.

I would like to add an example of where this would be useful in my organisation.

Its generally good practise to write terraform code that is agnostic of the environment that it will be applied against. One way this can be accomplished by having a variable:

variable "environment" {
  type = string
  condition     = contains(["live", "staging"], var.environment)
}

and that variable can be referred to in terraform resource blocks.

Another way is by using for_each arguments.

Sadly in our organisation we have lots of terraform code and much of it doesn’t exhibit these simple good practises. Hence we have code like the following:

resource "aws_s3_bucket" "dq-args-staging" {
  bucket        = "dq-args-staging"
  force_destroy = false
  versioning {
    enabled    = false
    mfa_delete = false
  }
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "AES256"
      }
      bucket_key_enabled = true
    }
  }
  lifecycle_rule {
    id      = "default_lifecycle_rules"
    enabled = true
    transition {
      days          = 0
      storage_class = "INTELLIGENT_TIERING"
    }
    abort_incomplete_multipart_upload_days = 7
    expiration {
      expired_object_delete_marker = true
    }
    noncurrent_version_expiration {
      days = 14
    }
  }
  # tflint-ignore: aws_resource_missing_tags
  tags = merge(
    module.tags.squad_data_infrastructure_staging,
    {
      Squad = "core-data-platform"
    }
  )
}
# tflint-ignore: terraform_naming_convention
resource "aws_s3_bucket" "dq-development" {
  bucket        = "dq-development"
  force_destroy = false
  versioning {
    enabled    = false
    mfa_delete = false
  }
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "AES256"
      }
      bucket_key_enabled = true
    }
  }
  lifecycle_rule {
    id      = "default_lifecycle_rules"
    enabled = true
    transition {
      days          = 0
      storage_class = "INTELLIGENT_TIERING"
    }
    abort_incomplete_multipart_upload_days = 7
    expiration {
      expired_object_delete_marker = true
    }
    noncurrent_version_expiration {
      days = 14
    }
  }
  # tflint-ignore: aws_resource_missing_tags
  tags = merge(
    module.tags.squad_data_infrastructure_staging,
    {
      Squad = "core-data-platform"
    }
  )
}

(This is a real example lifted from our code)

As you can see the two resources are virtually identical. I would have really liked sonarqube to detect this duplication.

Hope that helps.

1 Like