Failed Analysis - Unrecoverable Indexation Failures

Greetings,

Our environment:
Sonarqube: Community Edition - Version 8.9.6 (build 50800)
Java: openjdk version “11.0.5” 2019-10-15 LTS
Database: PostgreSQL 10.10 / PostgreSQL JDBC Driver 42.2.19
Elasticsearch: Sonarqube’s embedded es7
Sonarqube’s config:

vm.max_map_count=524288
fs.file-max=131072
ulimit -u = 8192
ulimit -n = 131072

After upgrading Sonarqube from 8.6.1 to 8.9.6, we’re getting the following error out of the blue on some of our projects. This results in a FAILED Project Analysis thus no analysis is generated for those. All failed projects have the same error:

Error Details
java.lang.IllegalStateException: Unrecoverable indexation failures: 2 errors among 2 requests
	at org.sonar.server.es.IndexingListener$1.onFinish(IndexingListener.java:39)
	at org.sonar.server.es.BulkIndexer.stop(BulkIndexer.java:131)
	at org.sonar.server.component.index.ComponentIndexer.delete(ComponentIndexer.java:170)
	at org.sonar.ce.task.projectanalysis.purge.IndexPurgeListener.onComponentsDisabling(IndexPurgeListener.java:41)
	at org.sonar.db.purge.PurgeDao.purgeDisabledComponents(PurgeDao.java:101)
	at org.sonar.db.purge.PurgeDao.purge(PurgeDao.java:64)
	at org.sonar.ce.task.projectanalysis.purge.ProjectCleaner.purge(ProjectCleaner.java:63)
	at org.sonar.ce.task.projectanalysis.purge.PurgeDatastoresStep.execute(PurgeDatastoresStep.java:54)
	at org.sonar.ce.task.step.ComputationStepExecutor.executeStep(ComputationStepExecutor.java:81)
	at org.sonar.ce.task.step.ComputationStepExecutor.executeSteps(ComputationStepExecutor.java:72)
	at org.sonar.ce.task.step.ComputationStepExecutor.execute(ComputationStepExecutor.java:59)
	at org.sonar.ce.task.projectanalysis.taskprocessor.ReportTaskProcessor.process(ReportTaskProcessor.java:81)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.executeTask(CeWorkerImpl.java:212)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.run(CeWorkerImpl.java:194)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.findAndProcessTask(CeWorkerImpl.java:160)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$TrackRunningState.get(CeWorkerImpl.java:135)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:87)
	at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:53)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

We did some search and found out that the issue is related with elasticsearch. The current version that is being used is es7. These is es7 logs:

 ERROR es[][o.e.m.f.FsHealthService] health check failed
org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2022-02-02T08:42:12Z, (lock=NativeFSLock(path=/opt/sonarqube-8.9.6.50800/data/es7/nodes/0/node.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2022-02-02T08:42:12.362101Z))
        at org.apache.lucene.store.NativeFSLockFactory$NativeFSLock.ensureValid(NativeFSLockFactory.java:191) ~[lucene-core-8.10.1.jar:8.10.1 2f24e6a49d48a032df1f12e146612f59141727a9 - mayyasharipova - 2021-10-12 15:13:05]
        at org.elasticsearch.env.NodeEnvironment.assertEnvIsLocked(NodeEnvironment.java:1103) ~[elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.env.NodeEnvironment.nodeDataPaths(NodeEnvironment.java:865) ~[elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.monitor.fs.FsHealthService$FsHealthMonitor.monitorFSHealth(FsHealthService.java:156) [elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.monitor.fs.FsHealthService$FsHealthMonitor.run(FsHealthService.java:144) [elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.threadpool.Scheduler$ReschedulingRunnable.doRun(Scheduler.java:214) [elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:777) [elasticsearch-7.16.2.jar:7.16.2]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-7.16.2.jar:7.16.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]
2022.02.03 08:52:23 WARN  es[][o.e.c.c.ClusterFormationFailureHelper] this node is unhealthy: health check failed due to broken node lock

We read other related articles, some said that this is related to disk space and all of the reporters had a few empty GB left on their disks. This might not be the case as we have 130 GB free (200 GB total size).

Others said that trashing /opt/sonarqube/data/es7 and restarting might be a possible fix. This is what we did yesterday and it fixed the issue temporarily. Unfortunately when we came back this morning, the issue persists.

Any other suggestions or observations?

Thanks for your time!

Hi,

Are you running a virus scanner or the like on the SonarQube box? The second error you’ve shown indicates some other process is at work, interfering.

 
Ann

Greetings Ann,

Thanks for your reply, as it seems there’s no such thing as a virus-scanner in our environment. Apart from Sonarqube, there’s a Prometheus node_exporter, which is up for months and has not caused any trouble so far.

Upon searching, we concluded that the node.lock file is being used only by the sonar process.
For your information, Sonarqube is fully functional during the day, the errors seem to occur once per day, at ~23:00, is there any scheduled es7 operation that takes place during that period?

Lastly, if there’s anything else related that I can investigate, please point me to that.

Thanks for your time!
Best regards,
Chris

Hi Chris,

IIRC, there’s only one process on a timer. It checks to see if it’s time to send out a license reminder & fires of some emails if it is.

I can say that Elasticsearch will lock its indices if available filespace falls below a certain percentage of disk size. That’s not what the logs indicate is happening, but I don’t suppose you have some process that temporarily gobbles a lot of disk at 23:00? Or other processes that run at ~23:00?

 
Ann

Hey Ann,

Sorry for the late reply. Nope there’s no such process. I enabled auditd to watch that node.lock file. I’ll come back at you by the time I get the new findings.

Chris

Good morning Ann,

Although I reassured you that there was no such a process when you asked me “you have some process that temporarily gobbles a lot of disk at 23:00?”, you were right.

In fact, it appeared that our sysadmins were capturing a backup of the VM remotely. We were completely unaware of that upon our first investigation and was extremely difficult to track down such a thing.

So, the backup process was holding the node.lock file and resulted in the errors. Thanks to auditd (Linux Auditing System) we “watched” the file and managed to capture that. We contacted our sysadmins and told them to exclude the/sonarqube-*/data/es7 from the daily backup routine (happened daily the same time that the errors were taking place at ~23:00), we came back this morning and the errors are gone.

Ann thank you so much for your time and effort. I hope this solution may assist other people as well.

Best regards,
Chris

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.