Some of my brokers have a "duplex" connection without the transport connector. Some others have a "not duplex" connection with the corresponding network connector in which case, they are TCP connected each others.

It is a complex network of brokers architecture.

The problem is that, sometimes, when active network connections are normally broken (network fault for example), the broker doesn't succeed in re-establishing them when the error is corrected.
If I start again the faulty broker, some faulty connections are up again, and some others (that were previously correctly up) can't be established .... very strange.
The number of active connections is not always the same.

I don't understand. We tried with the ActiveMQ 5.2 version and the fuse-5.3-0.6 version.

I just wonder if there can be a "normal" reason which can explain that ActiveMQ refuses to connect to a distant discovered broker.

We have an own external mechanism which transform multicast frames to TCP frames, and TCP frames to multicast frames back. (A gateway between different sub-networks). Perhaps the problem is here ....

Thank you for your answer.
Eric-AWL

Joe Fernandez wrote

What does the transport connector for your 'distant' broker look like? Reason I ask is that the wildcard (0.0.0.0) vs localhost IP address issue has been biting lots of folks.

Re: MultiCast Discovery and refusal of connection

Hi

With 5.2.0, I have some explicit log traces which can help to find the origin of the problem. I just opened the AMQ-2774 case.

The problem appears when
- the multicast discovery process discovered a distant broker.
a line "Establishing ..." appears in the log file
- But the connection is not established (for any reason)
no line "has been established" appears in the log file
- the connection is broken
a line "bridge to Unknown stopped" appears in the log file

Then, even if the distant broker is restarted, the current broker doesn't try to establish a connection with it (no new "Establishing ..." line appears)

I'm sure that I identified a Latch problem in Multicast Network Discovery mechanism on Duplex connection

The multicast notifier thread is blocked. here the trace

"Notifier-MulticastDiscoveryAgent-listener:DiscoveryNetworkConnector:NOCSupervisorP5-ADMIN-OUT-IN:BrokerService[SIBBusModule-NOCP5-tpnocp08s-bus]" daemon prio=10 tid=0x0000000044ff2400 nid=0x1389 waiting on condition [0x0000000044c26000..0x0000000044c26b90]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aaab3dd66f0> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
at org.apache.activemq.network.DemandForwardingBridgeSupport.start(DemandForwardingBridgeSupport.java:231)
at org.apache.activemq.network.DiscoveryNetworkConnector.onServiceAdd(DiscoveryNetworkConnector.java:114)
at org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent$2.run(MulticastDiscoveryAgent.java:484)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

The problem appears when the network is quickly and alternatively on/off between the two components.
The bridge is created in one direction, but the answer can not be received.

The thread is blocked on the CountDownLatch. Even if multicast frames are received, the component can not establish a new network connection.

I'm sure that I identified a Latch problem in Multicast Network Discovery mechanism on Duplex connection

The multicast notifier thread is blocked. here the trace

"Notifier-MulticastDiscoveryAgent-listener:DiscoveryNetworkConnector:NOCSupervisorP5-ADMIN-OUT-IN:BrokerService[SIBBusModule-NOCP5-tpnocp08s-bus]" daemon prio=10 tid=0x0000000044ff2400 nid=0x1389 waiting on condition [0x0000000044c26000..0x0000000044c26b90]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aaab3dd66f0> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207)
at org.apache.activemq.network.DemandForwardingBridgeSupport.start(DemandForwardingBridgeSupport.java:231)
at org.apache.activemq.network.DiscoveryNetworkConnector.onServiceAdd(DiscoveryNetworkConnector.java:114)
at org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent$2.run(MulticastDiscoveryAgent.java:484)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

The problem appears when the network is quickly and alternatively on/off between the two components.
The bridge is created in one direction, but the answer can not be received.

The thread is blocked on the CountDownLatch. Even if multicast frames are received, the component can not establish a new network connection.

Hi, as you can see, this is a complicated area of the code. The best
approach is to try and produce a test case for your scenario. Take a
look at the test: BrokerQueueNetworkWithDisconnectTest in
activemq-core. This can simulate network failures and can use
multicast (bridgeAllBrokers). Getting a reproducible test case is the
best way to validate your changes and protect them into the future.

Hi, as you can see, this is a complicated area of the code. The best
approach is to try and produce a test case for your scenario. Take a
look at the test: BrokerQueueNetworkWithDisconnectTest in
activemq-core. This can simulate network failures and can use
multicast (bridgeAllBrokers). Getting a reproducible test case is the
best way to validate your changes and protect them into the future.

Hi, as you can see, this is a complicated area of the code. The best
approach is to try and produce a test case for your scenario. Take a
look at the test: BrokerQueueNetworkWithDisconnectTest in
activemq-core. This can simulate network failures and can use
multicast (bridgeAllBrokers). Getting a reproducible test case is the
best way to validate your changes and protect them into the future.