How to deal with hanging tasks?

Hi @dmeneses ,

Here are thread dumps from the 4 java processes running in the docker container (#1 is the main process).

stack1.txt (37.5 KB)
stack2.txt (66.7 KB) stack3.txt (50.4 KB) stack4.txt (38.9 KB)

And the support info, if that is relevant.

sonarqube-support-info-B16BF065-AXDI2EJE8iwh_7V0_e4H-2020-7-6-8-22.json (21.0 KB)

Please let me know if you want me to do something else.

Cheers,
Jesper

PS: Around 200 jobs were pending when restarting, and it hung again. So I got another thread dump for comparison stack1-reset.txt (36.9 KB)

Thanks.
Based on the thread dumps, it looks like the Compute Engine had one active task that was busy writing logs at the time when the thread dump was performed.
This might indicate that the volume where logs are written is running out of space, or is blocking for another reason.
Could you please check the volume and maybe try creating files in it with the same user that is running SonarQube to see if it’s in a writable state?

Sure, will do next time there is a hang. Thanks!

Hi @dmeneses

I tried updating to the new 8.4, and also updated the JVM to 11.0.7.

I also updated the heap/file limits to the new recommendations for 8.4.

Sadly it hangs in the same way _hung-2020.07.13.txt (34.6 KB)

I tried connecting to the container, and can without problems write to the logs folder.
access.log in the same folder also gets updated without problems.

Maybe something in the logback configuration causing problems?

Seems like others experience the same problem with logback: https://jira.qos.ch/projects/LOGBACK/issues/LOGBACK-1406

Workarounds include reducing logging, and avoiding use of console appender.

The first suggestion is not worth much in practice.
But if the log output is going into a file anyway, maybe allow console log output to be suppressed?

The last logs you sent seem to be from the start process, not from the Compute Engine.

Anyway, good find about the Logback issue. Unfortunately, the bug report and the comments don’t really explain why the thread holding the lock is blocked writing to the file. I actually added a comment to it, since a lot of the comments are misleading.

Looking at the logs of the CE that you sent before (stack4.txt), it appears that you’re experiencing the same problem, except that there are no other threads waiting for the lock to be released.

"Worker 0 (UUID=AXMf7TQqAK9iwH23_ROA) on ce-worker-0" #33 prio=1 os_prio=0 cpu=92818.66ms elapsed=48441.33s tid=0x00007f2654fb0800 nid=0x138 runnable  [0x00007f26156f4000]
   java.lang.Thread.State: RUNNABLE
	at java.io.FileOutputStream.writeBytes(java.base@11.0.5/Native Method)
	at java.io.FileOutputStream.write(java.base@11.0.5/FileOutputStream.java:354)
	at java.io.BufferedOutputStream.write(java.base@11.0.5/BufferedOutputStream.java:123)
	- locked <0x00000000e07945e0> (a java.io.BufferedOutputStream)
	at java.io.PrintStream.write(java.base@11.0.5/PrintStream.java:559)
	- locked <0x00000000e07945b8> (a java.io.PrintStream)
	at java.io.FilterOutputStream.write(java.base@11.0.5/FilterOutputStream.java:108)
	at ch.qos.logback.core.joran.spi.ConsoleTarget$1.write(ConsoleTarget.java:37)
	at ch.qos.logback.core.OutputStreamAppender.writeBytes(OutputStreamAppender.java:199)
	at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:231)
	at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:102)
	at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
	at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
	at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
	at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
	at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
	at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
	at ch.qos.logback.classic.Logger.info(Logger.java:579)
	at org.sonar.api.utils.log.LogbackLogger.doInfo(LogbackLogger.java:90)
	at org.sonar.api.utils.log.BaseLogger.info(BaseLogger.java:76)
	at org.sonar.core.util.logs.DefaultProfiler.logInfo(DefaultProfiler.java:259)
	at org.sonar.core.util.logs.DefaultProfiler.log(DefaultProfiler.java:228)
	at org.sonar.core.util.logs.DefaultProfiler.doStop(DefaultProfiler.java:197)
	at org.sonar.core.util.logs.DefaultProfiler.stopInfo(DefaultProfiler.java:131)
	at org.sonar.ce.task.step.ComputationStepExecutor.executeStep(ComputationStepExecutor.java:85)
	at org.sonar.ce.task.step.ComputationStepExecutor.executeSteps(ComputationStepExecutor.java:72)
	at org.sonar.ce.task.step.ComputationStepExecutor.execute(ComputationStepExecutor.java:59)
	at org.sonar.ce.task.projectanalysis.taskprocessor.ReportTaskProcessor.process(ReportTaskProcessor.java:81)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.executeTask(CeWorkerImpl.java:209)
	at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.run(CeWorkerImpl.java:191)

It’s writing to a file, so I don’t see it being related to console output.
If you’d like to try change the logging configuration, you can do so by modifying the commented out section LOGGING in the file conf/sonar.properties.

Ah, I misunderstood.

In the log I posted, I see:
“Gobbler[ce]” #46 prio=5 os_prio=0 cpu=580.18ms elapsed=1534.72s tid=0x00007f8970004800 nid=0x101 runnable [0x00007f89b84e5000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(java.base@11.0.7/Native Method)

I thought that was what you meant by the Compute Enging hanging.

But is this the root of the problem? Do the two processes write to the same file and cause some kind of lock perhaps?

I will try to tweak the config file to see if it makes a difference.
However, I am taking 3 weeks of vacation starting Friday, so it will take a while before I can look at it again.

yea, that wasn’t much help. We did notice the problem when our cluster got upgraded from ubuntu 16 -> 18. Does that help at all? Nothing about our firewall changed at all, that upgrade was the only thing.

That’s just a thread in the start process waiting for logs to be sent out to the stdout of the Compute Engine process. Nothing abnormal there.

If you see a task in the CE hanging again, get the thread dump of the Compute Engine process.
Please let us know if you have any new information. It’s hard to find the root cause without a reproducer.

We did end up changing the following:
SONAR_JDBC_MAXIDLE=0
and
SONAR_JDBC_MINIDLE=0

Basically to keep it from trying to keep connections idle. But there does appear to be some sensitivity to idle connections getting closed that changes when our nodes upgraded from Ubuntu 16 -> 18.

New ce thread dump: ce-aug9.txt (40.9 KB)

Let me know if you want me to try something else.

Thanks. Once again, the logging appender is blocked trying to write logs. It’s not a lock problem, since it’s actually holding the lock.
There are some reports that the receiving end of pipes can stop consuming data causing the writer to block: http://www.jppf.org/tracker/tbg/jppf/issues/JPPF-343 https://issues.apache.org/jira/browse/LOG4J2-880
It’s unclear in what conditions that would happen but I’d try to make sure nothing is logged to the console (stdout/stderr). Did you set any property related to logging in sonar.properties? I think by default nothing is logged to stdout/stderr by default, only to files.

I did not change any of the logging configurations.

I will try to find some time to play with the log settings.

Looking at ConsoleTarget it uses System.out for its output.

So if something in the CE “hijacks” System.out, the problem may be similar to https://jira.qos.ch/browse/LOGBACK-504

Not sure if related. But that bug came up when searching.

Anyway, the curious thing is that the logback hang is reported to trigger when there is large amounts of output.
And I am looking at log-files of a couple of MB. Hardly a lot of data these days.

I will try to increase the log level to see if it triggers the bug in day-to-day operations (instead of just weekends when we build all projects).

Oops! I did change the logging.

But not via the log file, which is where I looked.
I start the java process with -Dsonar.log.console=true - I will try to remove that for a while.

Ok, that could be it.
What I suspect is that the reader of the pipe is not picking up the data, the pipe gets full and the logger hangs trying to write to it. The logger could probably do a better job handling that without blocking while holding a lock, but ultimately the problem is caused by a mismanagement of the pipes attached to the process.

Too early to tell if it makes a difference.

But FYI you have the same option in your official docker image. That is where I copied it from.

I will return with status next week.

SonarQube kept running over the weekend.

So the container option for console log may be the culprit.

I will trigger a full build (like weekends) tonight to see if it keeps running.

@dmeneses SonarQube survived another build storm without hanging.

So for my money, that console output option was the problem.

You should probably change your official docker image.

Cheers!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.