Elasticsearch unplugged - Networking changes in 2.0

You start Elasticsearch on your laptop. You issue a quick DELETE * to clear out yesterday’s experiments. Then you notice the plaintive cries issuing from the mouths of your developer colleagues and you wonder what pain they are suffering…

Elasticsearch has always been friendly and approachable. Testing out how a multi-node cluster works is as easy as starting up a few instances on your laptop: they auto-discover each other using multicast, form a cluster, and start sharing the load. But at times Elasticsearch has been a bit too friendly. Try starting Elasticsearch on your laptop at a conference, and you could easily find yourself participating in a 100 node cluster.

The soon-to-be-released 2.0.0-beta1 comes with some networking changes that make Elasticsearch choosier about who it talks to, while still maintaining the easy out-of-the-box developer experience.

Bind to localhost

Previously, Elasticsearch would bind to all available network interfaces by default, and try to choose the “best” interface as the publish_host — the address that Elasticsearch publicises to other nodes in the cluster.

Now, Elasticsearch will only bind to localhost by default. It will try to bind to both 127.0.0.1 (IPv4) and [::1] (IPv6), but will work happily in environments where only IPv4 or IPv6 is available. This change prevents Elasticsearch from trying to connect to other nodes on your network unless you specifically tell it to do so. When moving to production you should configure the network.host parameter, either in the elasticsearch.yml config file or on the command line:

Multicast removed

Elasticsearch 1.x uses multicast to discover other nodes on the network. Multicast made discovery almost magical… when it worked. Unfortunately, support for multicast is very patchy. Linux doesn’t allow multicast listening on localhost, while OS/X sends multicast broadcasts across all interfaces regardless of the configured bind address. On top of that, some networks have multicast disabled by default.

Elasticsearch 2.0 takes a different approach. Multicast has been removed (although it is still provided as a plugin for now). Instead, and only when bound to localhost, Elasticsearch will use unicast to contact the first 5 ports in the transport.tcp.port range, which defaults to 9300-9400.

This preserves the zero-config auto-clustering experience for the developer, but it means that you will have to provide a list of unicast hosts when going to production, for instance:

discovery.zen.ping.unicast.hosts: [ 192.168.1.2, 192.168.1.3 ]

You don’t need to list all of the nodes in your cluster as unicast hosts, but you should specify at least a quorum (majority) of master-eligible nodes. A big cluster will typically have three dedicated master nodes, in which case we recommend listing all three of them as unicast hosts.

This also brings the development experience more in-line with our recommended production networking configuration.

Node info changes

Finally, we have removed that weird inet[/127.0.0.1:9200] syntax that Elasticsearch used to use for IP addresses in the nodes-info API and elsewhere. Now, IP addresses are rendered according to the RFCs, e.g. 127.0.0.1:9200 (IPv4) or [::1]:9200 (IPv6).

Please ask any questions you may have about these changes in the Elasticsearch forum, and get ready for Elasticsearch 2.0.0. The first beta will be out soon!