Details

Description

Customer request to allow for a node that has not had a catastrophic failure to re-use it's data on disk when rejoining a cluster (after it has been failed over).

This goes a long with being able to support larger datasets on each node. By forcing a rebalance after failover, we are potentially copying 100's of GB of data over the network when it is seemingly unnecessary. Not only does this take quite a bit of time over the network, it is using network bandwidth unnecessarily (potentially impacting performance) and resynchronizing from disk would alleviate this and speed up the process considerably.

Issue Links

is duplicated by

MB-9979Delta node recovery after failover: a failed node should be able to catch up instead of being considered a new node

Activity

A similar request has also come up for being able to perform much more efficient "swap" rebalances. At least one customer needs to perform regular security updates on their hardware, and working with 96GB RAM and 1TB of disk, a 6 node cluster is going to take "too many" hours to complete. Given that this really needs to happen during a scheduled maintenance window and on a weekly basis, it seems we should really be able to provide a more efficient method. One rebalance at the beginning and maybe one at the end would be okay, but if we could re-sync during the swaps in the middle it would greatly speed up the overall process.

Perry Krug
added a comment - 21/Mar/13 10:42 AM A similar request has also come up for being able to perform much more efficient "swap" rebalances. At least one customer needs to perform regular security updates on their hardware, and working with 96GB RAM and 1TB of disk, a 6 node cluster is going to take "too many" hours to complete. Given that this really needs to happen during a scheduled maintenance window and on a weekly basis, it seems we should really be able to provide a more efficient method. One rebalance at the beginning and maybe one at the end would be okay, but if we could re-sync during the swaps in the middle it would greatly speed up the overall process.