parking to wait for <0x00000007dfdfd5f0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer
$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:
386)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:
502)

parking to wait for <0x00000007dfddab88> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer
$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:
386)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:
502)

parking to wait for <0x00000007dfdff3b0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer
$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:
386)
at
org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:
103)

parking to wait for <0x00000007dfdffc10> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer
$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:
386)
at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:
94)

Found one Java-level deadlock:
=============================
"pool-1-thread-205":
waiting to lock monitor 0x0000000001a19438 (object
0x00000007dfdd77b0, a
com.cloudera.flume.master.logical.LogicalConfigurationManager),
which is held by "pool-1-thread-18"
"pool-1-thread-18":
waiting to lock monitor 0x00000000021eb3e8 (object
0x00000007dfdf5380, a java.util.HashMap),
which is held by "pool-1-thread-9"
"pool-1-thread-9":
waiting to lock monitor 0x0000000001a19438 (object
0x00000007dfdd77b0, a
com.cloudera.flume.master.logical.LogicalConfigurationManager),
which is held by "pool-1-thread-18"

We've seen this particular deadlock twice. We have applications that create and teardown their own Flume configuration at startup and shutdown which seems to exacerbate the issue. I've attached a stack trace of the most recent incident.

Dan Everton
added a comment - 04/Aug/11 05:06 - edited We've seen this particular deadlock twice. We have applications that create and teardown their own Flume configuration at startup and shutdown which seems to exacerbate the issue. I've attached a stack trace of the most recent incident.

Just change the order of a line can solve the deadlock.
Although the status of some logical nodes could be wrong while mapping physical node and logical node.
Anyway,it will be correct afer mapping.
This is acceptable,isn't it?

RenHaojie
added a comment - 25/Sep/11 08:28 - edited Just change the order of a line can solve the deadlock.
Although the status of some logical nodes could be wrong while mapping physical node and logical node.
Anyway,it will be correct afer mapping.
This is acceptable,isn't it?

locked <0x00007f0494a4a310> (a java.io.BufferedInputStream)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
at com.cloudera.flume.handlers.thrift.TStatsTransport.read(TStatsTransport.java:59)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at com.cloudera.flume.conf.thrift.ThriftFlumeClientServer$Processor.process(ThriftFlumeClien
tServer.java:626)
at org.apache.thrift.server.TSaneThreadPoolServer$WorkerProcess.run(TSaneThreadPoolServer.ja
va:280)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)