We have 2 groups of HornetQ Multicast cluster deployed in JBOSS AS 7.1.1. Each group is having its own multicast address but same multicast group port.

Group A: Multicast address - 224.2.2.1, 9876

Host A - 127.0.0.1(Bind address)

Host B - 127.0.0.2(Bind address)

Group B: Multicast address - 224.2.2.2, 9876

Host A - 127.0.0.1(Bind address)

Host C - 127.0.0.3(Bind address)

We have multiple queues clustered. In both the groups, all the server instances share the same cluster name "my-cluster". In both the groups, Host A is shared (ie) application is deployed in Host A for both groups.

When we start both the group instances, I am getting the following error,

"most likely cause for this is you have a loop in your cluster due to cluster max-hops being too large or you have multiple cluster connections to the same nodes using overlapping addresses"

Is the problem due to our bind address being same Host A - 127.0.0.1(Bind address) ??

Apart from Multicast everything else is working fine as we have set different port numbers for both instances deployed in Host A.

problem #1: wrong binding. You are binding to 127.0.0.x addresses. These are loopback addresses -- read on http://www.rfc-editor.org/rfc/rfc3330.txt -- that means the nodes can't communicate directly which is needed in many places. Bind to addresses accessible from other nodes.

problem #2: unsupported hornetQ topology. You should all cluster on 1 UDP address.

Hi, Although I have said the bind adresses are 127.0.0...I mentioned just for an example. Its not actual. In actual we have bound the Host Names to their IP adresses.

Group A: Multicast address - 224.2.2.1, 9876

Host A - 127.0.0.1(Bind address)

Host B - 127.0.0.2(Bind address)

Group B: Multicast address - 224.2.2.2, 9876

Host A - 127.0.0.1(Bind address)

Host C - 127.0.0.3(Bind address)

One more thing is Group A and Group B are 2 different clusters altogether(2 different environments we use). So, I have given 2 different Multicast addresses. Even when the Multicast address is same it is getting me the same problem.

Host A is common for both the environments consider Dev and Test env. Our application is deployed in Host A in 2 separate instances. The Netty connector port numbers are different, but both the instance in Host A share the same bind address that is the Host A IP address.

So For Dev,

Netty Connector will be Host A IP : 2345(port number for example)

For Test

Netty Connector will be Host A IP : 2348(port number for example)

Will you please able to confirm me, why am I getting this error ? Is it any bug with JBOSS HornetQ ?

"most likely cause for this is you have a loop in your cluster due to cluster max-hops being too large or you have multiple cluster connections to the same nodes using overlapping addresses"

at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$MessageFlowRecordImpl.doConsumerCreated(ClusterConnectionImpl.java:1458) [hornetq-core-2.2.13.Final.jar:]

at org.hornetq.core.server.cluster.impl.ClusterConnectionImpl$MessageFlowRecordImpl.onMessage(ClusterConnectionImpl.java:1225) [hornetq-core-2.2.13.Final.jar:]

at org.hornetq.core.client.impl.ClientConsumerImpl.callOnMessage(ClientConsumerImpl.java:983) [hornetq-core-2.2.13.Final.jar:]

at org.hornetq.core.client.impl.ClientConsumerImpl.access$400(ClientConsumerImpl.java:48) [hornetq-core-2.2.13.Final.jar:]

at org.hornetq.core.client.impl.ClientConsumerImpl$Runner.run(ClientConsumerImpl.java:1113) [hornetq-core-2.2.13.Final.jar:]

at org.hornetq.utils.OrderedExecutorFactory$OrderedExecutor$1.run(OrderedExecutorFactory.java:100) [hornetq-core-2.2.13.Final.jar:]

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [rt.jar:1.6.0_19]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [rt.jar:1.6.0_19]

at java.lang.Thread.run(Thread.java:619) [rt.jar:1.6.0_19]

08:22:45,603 WARN [org.hornetq.core.server.cluster.impl.ClusterConnectionImpl] (Thread-9 (HornetQ-client-global-threads-1527318062)) MessageFlowRecordImpl [nodeID=f3dc24d4-d728-11e1-9b50-00163eaa0f5a, connector=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=55070&host=abc.server.com, queueName=sf.my-cluster.f3dc24d4-d728-11e1-9b50-00163eaa0f5a, queue=QueueImpl[name=sf.my-cluster.f3dc24d4-d728-11e1-9b50-00163eaa0f5a, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=baa6a3f8-da1f-11e1-a7a5-00163eaa0f5a]]@6099210f, isClosed=false, firstReset=true]::Remote queue binding 54a7c877-7adf-4bdb-b67c-dd52b9108e0fbaa6a3f8-da1f-11e1-a7a5-00163eaa0f5a has already been bound in the post office. Most likely cause for this is you have a loop in your cluster due to cluster max-hops being too large or you have multiple cluster connections to the same nodes using overlapping addresses

08:22:45,603 WARN [org.hornetq.core.server.cluster.impl.ClusterConnectionImpl] (Thread-19 (HornetQ-client-global-threads-1527318062)) MessageFlowRecordImpl [nodeID=f63a634e-d7f0-11e1-aa43-00163eaa0f5a, connector=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=55070&host=abc.server.com, queueName=sf.my-cluster.f63a634e-d7f0-11e1-aa43-00163eaa0f5a, queue=QueueImpl[name=sf.my-cluster.f63a634e-d7f0-11e1-aa43-00163eaa0f5a, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=baa6a3f8-da1f-11e1-a7a5-00163eaa0f5a]]@4aa168c, isClosed=false, firstReset=true]::Remote queue binding 54a7c877-7adf-4bdb-b67c-dd52b9108e0fbaa6a3f8-da1f-11e1-a7a5-00163eaa0f5a has already been bound in the post office. Most likely cause for this is you have a loop in your cluster due to cluster max-hops being too large or you have multiple cluster connections to the same nodes using overlapping addresses

I tried out different cluster names, multicast address, port numbers and even I changed my queue names. Its throwing the same error.

The issue seems to be closer to what Justin had mentioned, although I couldnt confirm 100%. Host A is common to Dev and Test environment. So bind address of Host A is same and the Netty Connector port number is different for both the instances in the Host A.

From the Test environment, when HornetQ tries to do a auto discovery in multicast cluster, test environment(Host A) instance gets port number of the Dev environment instance(Host A) first and does the connection. There is a Test Environment Host A instance which also tries to do a cluster with a different port number and it throws the overlapping address error.

This I could confirm from the logs. The Test Environment cluster is getting connected to all the Dev Environment Hosts because of the breach that has happened.

I am looking at taking the latest build to test this scenario, in between If I set the <local-bind-port> in broadcast group will it give me a solution ??

If I replace just the hornetQ module(whole hornetq module folder), I am getting these exceptions.. Will you please able to suggest me, is there a possibility if I can replace any specific jars within the hornetQ module or I need to replace more modules.

Looks like there were some changes in the AS 7 messaging subsystem since 7.1.1 that went along with the HornetQ upgrade so using that version of HornetQ doesn't appear to be as straight-forward as it first appeared.

Have you actually tried 7.2.0.Alpha to see if it resolves the problem?

From the link you had given me, I found that the change is only with hornetq-core.jar. So I replaced the hornetq-core-2.2.13 jar present in jboss 7.1.1 Final with hornetq-core-2.2.18 jar. The change is reflected in the following message. Everything is working fine except that, the overlapping issue didnt got solved.

Will you please clarify me, do I need to give a different node name(-Djboss.node.name=node1) through the runtime parameter, for all the nodes present in the multicast cluster, as specified in the below link. Is it mandatory ?

08-08-2012 15:14:41,030 WARN [org.hornetq.core.server.cluster.impl.ClusterConnectionImpl] (Thread-3 (HornetQ-client-global-threads-1288110412)) MessageFlowRecordImpl [nodeID=2076c198-e0bd-11e1-9639-00163eaa0f5a, connector=org-hornetq-core-remoting-impl-netty-NettyConnectorFactory?port=40297&host=our-server-com, queueName=sf.myconfig-cluster.2076c198-e0bd-11e1-9639-00163eaa0f5a, queue=QueueImpl[name=sf.myconfig-cluster.2076c198-e0bd-11e1-9639-00163eaa0f5a, postOffice=PostOfficeImpl [server=HornetQServerImpl::serverUUID=c4b0d943-e16b-11e1-9972-634e4086895c]]@10e80317, isClosed=false, firstReset=true]::Remote queue binding 92e18543-5dd4-424f-8059-6adbdf48b6bdc4b0d943-e16b-11e1-9972-634e4086895c has already been bound in the post office. Most likely cause for this is you have a loop in your cluster due to cluster max-hops being too large or you have multiple cluster connections to the same nodes using overlapping addresses