Broken indices due to ECS drain

Must-share information (formatted with Markdown):

  • Which versions are you using (SonarQube, Scanner, Plugin, and any relevant extension)
    • Enterprise EditionVersion 9.9.4 (build 87374)
  • How is SonarQube deployed: zip, Docker, Helm
    • Docker (AWS ECS cluster)
  • what are you trying to achieve
    • Keep indices consistent
  • what have you tried so far to achieve this
    • See below

Hey Team,

We run Sonar on AWS ECS Cluster and drain tasks to refresh EC2 instances. Each time we start a new task, Sonar indices require refreshing.

As a workaround, we set a lambda that checks daily for broken projects. It compares /api/issues/search?componentKeys=${project} against /api/measures/component?component=${project}, and whenever we encounter inconsistencies, we reindex with /api/issues/reindex.

From DevOps

For the ECS drain machine. The process sends a signal to a running container (sigterm) to stop, and once the EC2 instance has no containers on it, it is terminated and replaced. At the same time, additional computing is started for any containers to migrate to. If, after 20 minutes, a container does not gracefully shut down, the instance is terminated, and any running containers are terminated with the instance. We have no logs of this happening for SonarQube. This is essentially some chaos engineering, but waiting 20 minutes for containers to migrate.

The index issue is with elastic search, which is inside the SonarQube container. Given that it is a container and doesn’t have a state datastore/volume, it should be ephemeral, and data should be pulled from RDS/Postgres on startup. I suspect there is an issue when a new SonarQube container is started. It could be a lack of resources or a software issue.

While researching for the solution, we found a post about allowing running an independent ES. That should help address the symptoms.

Please advise us on the best solution/workaround to keep indices consistent after provisioning fresh Sonar tasks.

Regards,
Przemek

1 Like

Hey there.

I think this is probably the root of the problem. Having two SonarQube instances started up against the same database (not in a cluster config, as Data Center Edition) is simply not supported and is prone to causing corrupt/incomplete Elasticsearch indices.

Is that more or less what’s happening here? At some points, you have two SQ instances started against the same database?

Hey @Colin,

Thank you for getting back to me.

This could be the case as we scale up before scaling down, so there may be a moment when two instances talk to a single database.

The solution we apply is to scale down first, but in that case, we cause an outage.

Disregarding momentarily the fact that running multiple instances is against the license, what is the impact of corrupted indices? Is reindexing resolving the issue? Are there any severe side effects of repeatedly breaking indices?

Thanks for sharing about Data Center Edition, which includes HA. It feels too big for now.
Would you recommend a different solution?