voldemort.utils
Class ClusterForkLiftTool

java.lang.Object
voldemort.utils.ClusterForkLiftTool

All Implemented Interfaces:

java.lang.Runnable

public class ClusterForkLiftTool

extends java.lang.Object

implements java.lang.Runnable

Tool to fork lift data over from a source cluster to a destination cluster.
When used in conjunction with a client that "double writes" to both the
clusters, this can be a used as a feasible store migration tool to move an
existing store to a new cluster.
There are two modes around how the divergent versions of a key are
consolidated from the source cluster. :
1) Primary only Resolution (
ClusterForkLiftTool#SinglePartitionForkLiftTask: The entries on the
primary partition are moved over to the destination cluster with empty vector
clocks. if any key has multiple versions on the primary, they are resolved.
This approach is fast and is best suited if you deem the replicas being very
much in sync with each other. This is the DEFAULT mode
2) Global Resolution (
ClusterForkLiftTool#SinglePartitionGloballyResolvingForkLiftTask :
The keys belonging to a partition are fetched out of the primary replica, and
for each such key, the corresponding values are obtained from all other
replicas, using get(..) operations. These versions are then resolved and
written back to the destination cluster as before. This approach is slow
since it involves several roundtrips to the server for each key (some
potentially cross colo) and hence should be used when thorough version
resolution is neccessary or the admin deems the replicas being fairly
out-of-sync
In both mode, the default chained resolver (
VectorClockInconsistencyResolver +
TimeBasedInconsistencyResolver is used to determine a final resolved
version.
NOTES:
1) If the tool fails for some reason in the middle, the admin can restart the
tool for the failed partitions alone. The keys that were already written in
the failed partitions, will all experience ObsoleteVersionException
and the un-inserted keys will be inserted.
2) Since the forklift writes are issued with empty vector clocks, they will
always yield to online writes happening on the same key, before or during the
forklift window. Of course, after the forklift window, the destination
cluster resumes normal operation.
3) For now, we will fallback to fetching the key from the primary replica,
fetch the values out manually, resolve and write it back. PitFalls : primary
somehow does not have the key.
Two scenarios.
1) Key active after double writes: the situation is the result of slop not
propagating to the primary. But double writes would write the key back to
destination cluster anyway. We are good.
2) Key inactive after double writes: This indicates a problem elsewhere. This
is a base guarantee voldemort should offer.
4) Zoned Non Zoned forklift implications.
When forklifting data from a non-zoned to zoned cluster, both destination
zones will be populated with data, by simply running the tool once with the
respective bootstrap urls. If you need to forklift data from zoned to
non-zoned clusters (i.e your replication between datacenters is not handled
by Voldemort), then you need to run the tool twice for each destination
non-zoned cluster. Zoned -> Zoned and Non-Zoned -> Non-Zoned forklifts are
trivial.