Broken indices due to ECS drain

Przemek_Sech · April 22, 2024, 4:34am

Must-share information (formatted with Markdown):

Which versions are you using (SonarQube, Scanner, Plugin, and any relevant extension)
- Enterprise EditionVersion 9.9.4 (build 87374)
How is SonarQube deployed: zip, Docker, Helm
- Docker (AWS ECS cluster)
what are you trying to achieve
- Keep indices consistent
what have you tried so far to achieve this
- See below

Hey Team,

We run Sonar on AWS ECS Cluster and drain tasks to refresh EC2 instances. Each time we start a new task, Sonar indices require refreshing.

As a workaround, we set a lambda that checks daily for broken projects. It compares /api/issues/search?componentKeys=${project} against /api/measures/component?component=${project}, and whenever we encounter inconsistencies, we reindex with /api/issues/reindex.

From DevOps

For the ECS drain machine. The process sends a signal to a running container (sigterm) to stop, and once the EC2 instance has no containers on it, it is terminated and replaced. At the same time, additional computing is started for any containers to migrate to. If, after 20 minutes, a container does not gracefully shut down, the instance is terminated, and any running containers are terminated with the instance. We have no logs of this happening for SonarQube. This is essentially some chaos engineering, but waiting 20 minutes for containers to migrate.

The index issue is with elastic search, which is inside the SonarQube container. Given that it is a container and doesn’t have a state datastore/volume, it should be ephemeral, and data should be pulled from RDS/Postgres on startup. I suspect there is an issue when a new SonarQube container is started. It could be a lack of resources or a software issue.

While researching for the solution, we found a post about allowing running an independent ES. That should help address the symptoms.

Please advise us on the best solution/workaround to keep indices consistent after provisioning fresh Sonar tasks.

Regards,
Przemek

Colin · April 23, 2024, 8:14am

Hey there.

I think this is probably the root of the problem. Having two SonarQube instances started up against the same database (not in a cluster config, as Data Center Edition) is simply not supported and is prone to causing corrupt/incomplete Elasticsearch indices.

Is that more or less what’s happening here? At some points, you have two SQ instances started against the same database?

Przemek_Sech · April 24, 2024, 4:28am

Hey @Colin,

Thank you for getting back to me.

This could be the case as we scale up before scaling down, so there may be a moment when two instances talk to a single database.

The solution we apply is to scale down first, but in that case, we cause an outage.

Disregarding momentarily the fact that running multiple instances is against the license, what is the impact of corrupted indices? Is reindexing resolving the issue? Are there any severe side effects of repeatedly breaking indices?

Thanks for sharing about Data Center Edition, which includes HA. It feels too big for now.
Would you recommend a different solution?

Topic		Replies	Views
Upgrade to SonarQube version 10.0 SonarQube Server / Community Build sonarqube	3	356	January 18, 2024
Elasticsearch indices error SonarQube Server / Community Build sonarqube	8	1027	December 27, 2023
Could not fix elasticsearch indices error SonarQube Server / Community Build	8	984	October 20, 2023
Elastic Search Index SonarQube Server / Community Build sonarqube , elasticsearch	15	3853	July 28, 2022
Unrecoverable indexation failures Error! SonarQube Server / Community Build	4	3871	November 12, 2019

Broken indices due to ECS drain

Related topics