How to use Gizzmo

Pages 1

Clone this wiki locally

Gizzmo is the command-line tool used to create and manage Gizzard clusters.

Gizzard decides where to store your data by hashing your input key into buckets. Each bucket forwards to a tree topology, which is comprised of virtual and concrete shards that control how and where your data is stored.

Gizzmo commands talk to Gizzard appservers to query and modify their representations of this data (stored persistently on the nameservers), and to create shards and copy data between shards on the storage layer.

If you were to query the application server above for the current topology of table 0 (tables will be explained later), you would receive something like this:

The number preceding each topology is the number of shards that fall under it. This topology representation is precise, but can become fairly verbose to view and specify. For a given application with user-specified weights and datatypes and a slightly more complex tree structure, it can look more like this:

For this reason there's an option to simplify the representation of these trees, sacrificing some information and taking for granted the structure of your topology for the sake clarity. For the above command it might look like this:

The first step in administering a Gizzard cluster with Gizzmo is to create a table or tables with a set of shards. This is as simple as issuing a create-table command. Tables in Gizzard are separate datasets that live on the same cluster. They're a flexible construct you can choose to use for your application. If you only need one simple dataset, having one table with id 0 is probably sufficient. An example of the benefit of tables is FlockDB, a graph datastore that uses pairs of tables (e.g. 1,-1) to model the positive and negative directional edges in a graph store.

Creating a table

You can create a table by specifying pairs of weights and topologies (shown below in both complex and simple formats)

Remember that whenever you specify tables with -T, you can specify a comma separated list of tables.

Gizzmo has commands for basic manipulation of topologies, like adding or removing links or creating and deleting shards. It also has some more powerful commands to transform topologies from one structure to another or change the entire composition of the cluster.

Transforming a topology

You can change all shards matching one topology into another topology using the transform command.

Rebalancing the cluster

The rebalance command allows you to specify an entirely new set of topologies for your cluster. The syntax is similar to create-table, except it acts on an existing cluster. Gizzmo will attempt to minimize data movement when rebalancing the cluster. This command is generally used to add capacity.

The above command would add the set of hosts DBa, DBb, and DBc to our previously created table, and balances the cluster such that all sets of hosts now own 1/4 of all shards. The max-copies and copies-per-host options ensure that your machines don't become so busy with copy work that they can't serve regular traffic. Because of the way copies work in Gizzard, all shards involved in a copy should be included in this number. That means if you're copying from shard_1 to shard_2, you have 2 copies in progress.

Adding new sets of machines to a cluster for capacity is a common task, and having to respecify all existing topologies is cumbersome. Because of this, there is an add-partition command which acts exactly like rebalance, except doesn't require you to specify existing topologies. It is useful to consider sets of machines in a topology as a partition. Many Gizzard clusters have partitions consisting of completely mirrored machines, though at the cost of managerial complexity you may choose to stripe machines across partitions.

You can add multiple partitions at once. There is also an equivalent remove-partition command.

It may be important to note that adding/removing partitions reweights all topologies evenly. Most Gizzard clusters already weigh all partitions evenly, so this should not be an issue. If you wish to add or maintain unevenly weighted partitions, use the rebalance command.

Repairing shards

To help with the eventually consistent model of Gizzard, you may want to occasionally run repairs on shards from Gizzmo. A repair is an action to compare a set of shards that should be identical, identify any missing or outdated data between them, and write in the correct data. Repair is conveniently encapsulated in the copy command, as copies in general (in Gizzard) are a special case of repairing.

Using a forwarding shard, like one of those we saw in the transforms section above, you can identify all of its concrete subshards using the subtree command.

Transform options

--batch-finish

When used with transforms, the --batch-finish flag causes an alternative plan to be used: the goal of which is to copy all data that is going to move before making destination shards active. The plan is executed as:

Add new shards as blocked

Copy data to new shards and mark them write-only

When all copies have finished, ask the operator whether to move to the next step, and then...

Mark added shards readable

Finally, when the operator confirms that it is safe, remove any shards that are no longer needed

--rollback-log

The --rollback-log=NAME flag causes a rollback log to be written to the nameserver database as individual operations for any transform operation: for example, transform-tree: gizzmo --rollback-log="my-transform" transform-tree .... The log of the transform can be manually rolled back by using the gizzmo log-rollback NAME command, which will roll the transform back to the last safe state. When the flag is in use, it is (theoretically) safe to kill the gizzmo process at any time during a transform, assuming you manually run the gizzmo log-rollback command afterward.

The rollback log works by adding 'commit' entries to the gizzmo plan which indicate positions in the transform which it would be safe to roll back to. In particular, when used with the --batch-finish flag, all of steps 1, 2, 3, and 4 could be rolled back (although the copying portion of step 2 becomes a noop.) When not using the --batch-finish flag, only the last partial shard creation can be rolled back: this means that killing a transform that adds 100 shards during the 25th shard, will cause only the partial effort for shard 25 to be rolled back.