Say I have 1 master node and I want to add 2 data nodes.
My master node is continuously indexed, but I have a backup made with elasticsearch curator available on amazon S3. Say the data is 10 minutes behind master data.

When launching my new data nodes, should I just make them join the master node cluster, then they will automatically get data from the master ?

Or is it possible to restore the data from S3 on the 2 new data nodes, then make them join the cluster. At this time will they catch up to the data on the master ? What if some files which were backup on S3 have been deleted on master node, will they be deleted on data nodes too ?
I am interested in this 2nd solution as it may be faster if the nodes are distant from my master node.

When launching my new data nodes, should I just make them join the master node cluster, then they will automatically get data from the master ?

Yes.

Xavier_TROMP:

Or is it possible to restore the data from S3 on the 2 new data nodes, then make them join the cluster. At this time will they catch up to the data on the master ? What if some files which were backup on S3 have been deleted on master node, will they be deleted on data nodes too ?

Don't do this.

Xavier_TROMP:

I am interested in this 2nd solution as it may be faster if the nodes are distant from my master node.