Question

I think I found a situation where mixed usage of log4j a) directly and b) via commons-logging causes some kind of class-loading deadlock. I'm not sure if such a situation is possible at all (shouldn't the JVM detect that?) and what to do about it.

The problem

In our build system, we currently are running our unit tests sequentially - to make the build faster, we obviously can change that to run our unit tests in parallel. However, if we do so, some builds run into an execution timeout. When analysing the thread dump of such "hanging builds", we find ourselves in different modules with different tests involved most of the time. But it always boils down to two threads trying the initialize a Logger: one with Logger.getLogger (directly using log4j), the other with LogFactory.getLog (using commons-logging).

My analysis

So we have one thread (the one using log4j directly) waiting at this place:

"pool-1-thread-3" prio=10 tid=0x00007f6528ca6000 nid=0x6f8f in Object.wait() [0x00007f64d9ca6000]
  java.lang.Thread.State: RUNNABLE
at org.apache.log4j.LogManager.<clinit>(LogManager.java:82)
at org.apache.log4j.Logger.getLogger(Logger.java:117)
at de.is24.platform.contacts.domain.PhoneNumberFormat.<clinit>(PhoneNumberFormat.java:21)

which, unfortunately, is a rather "crowded" line:

Hierarchy h = new Hierarchy(new RootLogger((Level) Level.DEBUG));

And another thread (using commons-logging) waiting here:

"pool-1-thread-2" prio=10 tid=0x00007f6528bf9800 nid=0x6f8e in Object.wait() [0x00007f64d9da7000]
  java.lang.Thread.State: RUNNABLE
at org.apache.log4j.Priority.<clinit>(Priority.java:45)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:171)
at org.apache.commons.logging.impl.Log4JLogger.class$(Log4JLogger.java:37)
at org.apache.commons.logging.impl.Log4JLogger.<clinit>(Log4JLogger.java:45)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:209)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:351)

which is straightforward:

final static public Priority FATAL = new Level(FATAL_INT, "FATAL", 0);

So to me, it seems like the second thread is in the process of initializing class Priority and waits to load the Level class. The first thread probably attempts to load the Level class (the other stuff in that line seems unrelated), and as the Level class extends Priority, waits for the Priority class to be loaded.
There we have our deadlock.

So, can you tell me if this analysis is correct? Or did I miss something?

UPDATE I

I wrote some test cases, you can find them here: https://github.com/sebastiankirsch/classloading

There are several test cases demonstrating the problem:

  1. TestLoadingByClassForName should cause a deadlock rather quickly (every third run on my machine)
  2. TestLoadingMixed represents a version of the problem being stripped down to the minimum of the observed stack trace; this one fails much more infrequently (roughly by factor 4)
  3. TestMixedLoggerInstantiation mimics the behaviour: one class instantiates a logger using log4j, another using commons-logging. The deadlock is hard to catch here, as there is much more code involved - I needed to add a random sleep which certainly needs to be adapted to your machine

Here's a stack trace of the TestMixedLoggerInstantiation test case hanging.

Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode):

"UseLog4JLogger" prio=10 tid=0x00007fa1f017d800 nid=0x7bd8 in Object.wait() [0x00007fa1e5ba4000]
   java.lang.Thread.State: RUNNABLE
    at org.apache.log4j.LogManager.<clinit>(LogManager.java:82)
    at org.apache.log4j.Logger.getLogger(Logger.java:117)
    at net.tcc.classloading.UseLog4JLogger.run(UseLog4JLogger.java:23)

"UseCommonsLoggingLogFactory" prio=10 tid=0x00007fa1f00e5000 nid=0x7bd7 in Object.wait() [0x00007fa1e5ca4000]
   java.lang.Thread.State: RUNNABLE
    at org.apache.log4j.Priority.<clinit>(Priority.java:45)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:169)
    at org.apache.commons.logging.impl.Log4JLogger.class$(Log4JLogger.java:37)
    at org.apache.commons.logging.impl.Log4JLogger.<clinit>(Log4JLogger.java:45)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:529)
    at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:235)
    at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:209)
    at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:351)
    at net.tcc.classloading.UseCommonsLoggingLogFactory.run(UseCommonsLoggingLogFactory.java:13)

"ReaderThread" prio=10 tid=0x00007fa1f00ed000 nid=0x7bd6 runnable [0x00007fa1e5da6000]
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:129)
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
    - locked <0x00000007d7088a00> (a java.io.InputStreamReader)
    at java.io.InputStreamReader.read(InputStreamReader.java:167)
    at java.io.BufferedReader.fill(BufferedReader.java:136)
    at java.io.BufferedReader.readLine(BufferedReader.java:299)
    - locked <0x00000007d7088a00> (a java.io.InputStreamReader)
    at java.io.BufferedReader.readLine(BufferedReader.java:362)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner$ReaderThread.run(RemoteTestRunner.java:140)

"Low Memory Detector" daemon prio=10 tid=0x00007fa1f009d800 nid=0x7bd4 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007fa1f009b800 nid=0x7bd3 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007fa1f0098800 nid=0x7bd2 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007fa1f0096800 nid=0x7bd1 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007fa1f007a000 nid=0x7bd0 in Object.wait() [0x00007fa1e6c54000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d7001300> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
    - locked <0x00000007d7001300> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x00007fa1f0078000 nid=0x7bcf in Object.wait() [0x00007fa1e6d55000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d70011d8> (a java.lang.ref.Reference$Lock)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
    - locked <0x00000007d70011d8> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x00007fa1f000c000 nid=0x7bc5 in Object.wait() [0x00007fa1f50b0000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000007d730dfd8> (a net.tcc.classloading.UseCommonsLoggingLogFactory)
    at java.lang.Thread.join(Thread.java:1186)
    - locked <0x00000007d730dfd8> (a net.tcc.classloading.UseCommonsLoggingLogFactory)
    at java.lang.Thread.join(Thread.java:1239)
    at net.tcc.classloading.TestMixedLoggerInstantiation.test(TestMixedLoggerInstantiation.java:21)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

"VM Thread" prio=10 tid=0x00007fa1f0071800 nid=0x7bce runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fa1f001f000 nid=0x7bc6 runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fa1f0021000 nid=0x7bc7 runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fa1f0022800 nid=0x7bc8 runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fa1f0024800 nid=0x7bc9 runnable 

"GC task thread#4 (ParallelGC)" prio=10 tid=0x00007fa1f0026800 nid=0x7bca runnable 

"GC task thread#5 (ParallelGC)" prio=10 tid=0x00007fa1f0028000 nid=0x7bcb runnable 

"GC task thread#6 (ParallelGC)" prio=10 tid=0x00007fa1f002a000 nid=0x7bcc runnable 

"GC task thread#7 (ParallelGC)" prio=10 tid=0x00007fa1f002c000 nid=0x7bcd runnable 

"VM Periodic Task Thread" prio=10 tid=0x00007fa1f00a8800 nid=0x7bd5 waiting on condition 

JNI global references: 1168

Reproducing the deadlock

Download the code from https://github.com/sebastiankirsch/classloading.
TestLoadingByClassForName should easily cause a deadlock for you (just run it a few times), this is a prerequisite that your system/JVM will eventually run into a deadlock for the complex scenario.

Now run TestMixedLoggerInstantiation several times. Note the average run time, open up UseLog4JLogger and set the sleep timer sum to a little less than the average run time. This will eventually cause a deadlock, but it's occurring rarely.

Therefore, run it on the command line like this:

for (( ; ; )) do
  testExectution
done

Instead of putting the testExecution together manually, just set a break boint in the test, debug, type ps -ef | grep TestMixedLoggerInstantiation in a shell, and copy the command your IDE uses.

Was it helpful?

Solution

I finally found the answer in the Java Language Specification, specifically in chapter 12.4.2 Detailed Initialization Procedure.
There it says

[...]
2) If the Class object for C indicates that initialization is in progress for C by some other thread, then [...] block the current thread until informed that the in-progress initialization has completed, [...]
7) Next, if C is a class rather than an interface, and its superclass SC has not yet been initialized, then recursively perform this entire procedure for SC
10) If the execution of the initializers completes normally, [...] label the Class object for C as fully initialized, notify all waiting threads, [...]

So the behaviour observed is exactly as specified by the JLS. I'm still a bit baffled that there's no means to detect such a deadlock. And why the threads are marked as RUNNABLE - but I guess this isn't your typical bytecode to be executed, so who knows...

How to fix the issue

The solution for the problem was to get rid of commons-logging. As @Robert Johnson pointed out, this can easily be done by using org.slf4j:jcl-over-slf4j instead. I also checked the SLF code: it does not "take advantage" of the unlucky log4j design.

OTHER TIPS

Your analysis is correct.
You can try to run your concurrent tests in different classloaders, see the discussion here on how to do it. There's is an open bug in Surefire and discussion in JUnit groups on this issue.
As a workaround you can use org.apache.myfaces.test.runners.TestPerClassLoaderRunner as described in the link above.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top