Prevent operator mistakes due to simultaneous bootstrap

Details

Description

Cassandra has always had the '2 minute rule' between beginning topology changes to ensure the range announcement is known to all nodes before the next one begins. Trying to bootstrap a bunch of nodes simultaneously is a common mistake and seems to be on the rise as of late.

We can prevent users from shooting themselves in the foot this way by looking for other joining nodes in the shadow round, then comparing their generation against our own and if there isn't a large enough difference, bail out or sleep until it is large enough.

It will not throw an error but it defeats the purpose of the ticket. I need to think about it deeper, but if two nodes are bootstrapping and fall in the bounds of the original token range then you would, in the end, not have a consistent bootstrap.

T Jake Luciani
added a comment - 23/Apr/14 01:15 It will not throw an error but it defeats the purpose of the ticket. I need to think about it deeper, but if two nodes are bootstrapping and fall in the bounds of the original token range then you would, in the end, not have a consistent bootstrap.

Brandon Williams
added a comment - 23/Apr/14 01:19 It will not throw an error but it defeats the purpose of the ticket
We know from experience that telling people "don't do that" isn't good enough... what I'm proposing here is to either not allow it, or sleep long enough that it avoids any issues.

T Jake Luciani
added a comment - 23/Apr/14 01:32 Right, I agree. What I'm saying it we may need to error any simultaneous bootstraps, they would need to happen fully one at a time.
Honestly I don't understand how the "shadow" round works well enough to know if two bootstraps placed N minutes apart will end up with consistency issues ala 2434 but I suspect it would be an issue.

Robert Coli
added a comment - 14/May/14 19:34 We know from experience that telling people "don't do that" isn't good enough... what I'm proposing here is to either not allow it, or sleep long enough that it avoids any issues.
+1 this, a lot.

Disccusing this offline with Jake, we decided it's still possible to violate consistent range movement even following the 2 minute rule, so leaving this as-is. If people don't care, they can simply disable consistent range movement.

Brandon Williams
added a comment - 15/Sep/14 14:19 Disccusing this offline with Jake, we decided it's still possible to violate consistent range movement even following the 2 minute rule, so leaving this as-is. If people don't care, they can simply disable consistent range movement.

Disccusing this offline with Jake, we decided it's still possible to violate consistent range movement even following the 2 minute rule, so leaving this as-is. If people don't care, they can simply disable consistent range movement.

Brandon Williams and T Jake Luciani just wondering how you both worked out it was possible to violate consistent range movement even after following the 2 minute rule? Is it possible for tokens assigned to a first bootstrapping node to then be reassigned to a second bootstrapping node?

Anthony Grasso
added a comment - 29/Mar/17 23:59 - edited Disccusing this offline with Jake, we decided it's still possible to violate consistent range movement even following the 2 minute rule, so leaving this as-is. If people don't care, they can simply disable consistent range movement.
Brandon Williams and T Jake Luciani just wondering how you both worked out it was possible to violate consistent range movement even after following the 2 minute rule? Is it possible for tokens assigned to a first bootstrapping node to then be reassigned to a second bootstrapping node?