I've recently started looking at elasticsearch. Specifically deploying it
to AWS. I have a few question about the clustering set up which I hope you
can help with. It mostly stems from a bit of confussion about what the
'master' node does.

We are currently using the flume elasticsearch sink to push log data into
our elasticsearch cluster. The cluster sits behind a load balancer, hence
flume is configured with a cname in it and not a list of master nodes.
We've noticed that when we redeploy our cluster, flume looses network
connection to the cluster and this doesn't fix itself when the cluster
comes back. We have to restart flume then everthing works fine.

We traced this to the fact that the sink and the underlying elasticsearch
library seems to keep a list of available IPs, the elastic load balancer
just returns the ip address of an instance in a round robin way which the
flume sink resolves and caches, so ultimately flume ends up with a fixed IP
of one of the master instances for the entire duration of it's run. However
this IP obviously changes everytime we redeploy our cluster.

So my questions are:

What is the best configuration for the master nodes? Should they have
static IPs on AWS and flume should be configured to use these directly?
Effectively ignoring the load balancer.

What do we in a multiple zone scenario? How would we handle failover?
Should we have for example an extra set of static ips for each zone so that
if a zone disappears the remaining zones will scale up and grab an ip from
this list of static ips.

I suppose ultimately I'm trying to understand what the master node does and
how it should be utilised in the amazon world.