Zen uses multicast to locate other nodes and by using tcpdump I could see that some nodes weren’t receiving the multicast traffic. This lead me straight to the network layer and I found the culprit; a Cisco Nexus 5000 switch.

Normally a switch often treats multicast traffic much like broadcast traffic, it floods the packet to every port on the same VLAN. However Cisco switches try to be clever and learn which ports are interested in receiving the traffic by listening for IGMP packets, (referred to as snooping), but IGMP is only sent if there’s a multicast router on the network, (note also I’m not trying to route multicast, all nodes are on the same VLAN).

The solution was to enable a feature on the Nexus known as an “IGMP querier”. What this does is mimic enough of a multicast router that nodes report which multicast groups they’re interested in receiving traffic for and the switch can then learn which ports to forward multicast traffic on.

On the Nexus 5000, I needed to add the following configuration:

vlan configuration 1234
ip igmp snooping querier 1.2.3.4

vlan configuration 1234
ip igmp snooping querier 1.2.3.4

(If you have a Catalyst switch running IOS, the configuration should be very similar)

The VLAN should match whatever VLAN the nodes are attached to, and you can basically make up the IP address used here, the switch sends IGMP packets with it as the source, but it’s never used as the destination for packets, nodes use a specific multicast group instead.

As soon as I added this configuration my Elasticsearch cluster sprung into life.

This entry was posted
on Friday, May 31st, 2013 at 9:31 am
and is filed under Networks.
You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.