Details

Description

CASSANDRA-10406 introduced the ability to rebuild specific ranges, and CASSANDRA-9875 extended that to allow specifying a set of hosts to stream from. It's not incredibly clear why you would only want to stream a subset of ranges, but a possible use case for this functionality is to rebuild a node from targeted replicas.

When doing a DC migration, if you are using racks==RF while rebuilding you can ensure you rebuild from each copy of a replica in the source datacenter by specifying all the hosts from a single rack to rebuild a single copy from. This can be repeated for each rack in the new datacenter to ensure you have each copy of the replica from the source DC, and thus maintaining consistency through rebuilds.

For example, with the following topology for DC A and B with an RF of A:3 and B:3

A

B

Node

Rack

Node

Rack

A1

rack1

B1

rack1

A2

rack2

B2

rack2

A3

rack3

B3

rack3

The following set of actions will result in having exactly 1 copy of every replica in A in B, and B will be at least as consistent as A.

Rebuild B1 from only A1
Rebuild B2 from only A2
Rebuild B3 from only A3

Unfortunately using this functionality is non-trivial at the moment, as you can only specify specific sources WITH the nodes set of tokens to rebuild from. To perform the above with vnodes/a large cluster, you will have to specify every token range in the -ts arg, which quickly gets unwieldy/impossible if you have a large cluster.

A solution to this is to simply filter on sources first, before processing ranges.