Data seem sometimes to have their own life and will, and they refuse to behave as we wish.
Then, you need a firm hand to tame the wild data and turn them into quiet and obeying pets.

Friday, February 25, 2011

Advanced replication for the masses - Part II - Parallel replication

I hope you liked the first part of this series of lessons. And I really hope that you have followed the instructions and got your little replication cluster up and working.

If you haven't done that, thinking that you would spare your energies for more juicy matters, I have news for you. What I explained in the previous part is exactly what you need to do to set up parallel replication. With just a tiny additional detail.
For the sake of the diligent readers who have followed the instructions with the first lessons, I won't repeat them, but I'll invite you to set the environment as explained in the first part.
Once you have a cluster up and running, and you can confirm that replication is indeed working with Tungsten, you can remove all with the clear_cluster.sh script, and launch again the set_installation.sh script, with the tiny detail we have mentioned above.
The astute readers may have noticed that the installation script contains these lines:

This means that you can start set_replication.sh with one additional option, which will be passed to the creation of the Tungsten service. Without further suspense, the addition that you need is --channels=5.
Yep. It was that easy.

./set_replication.sh --channels=5

This little addition will start your Tungsten replicator, apparently in the same way it did before. But there is a substantial difference. While the data is funneled from the master to the slaves in the usual way, the applier splits the data by database. You can see the difference as soon as you send some data through the pipeline.

Notice, at first sight, that there are five rows instead of one. Each row is a channel. Since the master has used three databases, you see three channels occupied, each one showing the latest sequence that was applied. Now, if we do something to database mydb2, we should see one of these channels change, while the others stay still.

The channel used by mydb2 had previously applied the sequence number 5. The latest sequence number was previously 6, used in another channel. After two more events in this database, the sequence number has jumped to 8.
The eventID has also changed. The first part of the eventID is the binary log number (as in mysql-bin.000002), the second is the log position (1124), and the third one is the session ID (45).
Enough of peeking over the replicator's shoulder. There are more tools that let you inspect the status of the operations.
We have seen trepctl services, which keeps some of its usefulness also with parallel replication. In the master, it says:

Which is mostly all we need to know.
Since the slave has more than one channel, though, we need more specialized information on that side of the applier. For this reason, we use a more specialized view. We may start with trepctl status, which has information that is roughly equivalent to "SHOW SLAVE STATUS" in MySQL native replication.

Also this command, which is perfectly useful in single channel replication, lacks the kind of detail that we are after. Tungsten 2.0 introduces two variations of this command, with more detailed metadata.

The -name tasks command gives you a list of the latest tasks that were happening.
This is probably more information that you want to know about, but in case of troubleshooting it may become a blessing. Let's follow for a moment what's going on to appliedLastSeqno 8. You will find three tasks with this sequance number. The first one has stage "remote-to-thl", which is the stage where the transaction is transported from the master to the Transaction History List (THL, which is Tungsten lingo to what you may also call a relay log.). The second task that mentions appliedLastSeqno 8 is in stage "thl-to-q", which is the phase where a transaction is assigned to a given shard. The third occurrence happens in stage "q-to-dbms", which is where the transaction is executed in the slave.
For a different view of what is going on, you may use trepctl status -name shards. A Shard, in this context, is the criteria used to split the transactions across channels. By default, it happens by database. We will inspect its mechanics more closely in another post. For now, let's have a look at what shards we have in our slave:

You may read the information quite easily. Each shard tells you by which key it was identified (shardID), and this is the same as the database name. The appliedLastSeqno and stage we have met already. The appliedLatency is roughly equivalent to MySQL's seconds behind master, but more granular than that. And eventCount tells you how many transactions went through this shard.
If you are the adventurous type, you may have a look at the THL itself, and get a glimpse of how the replication and the parallelism works.
In the slave, type the following

# slave
thl -service logos list |less

Then look for "SEQ#" and you will find the global transaction IDs, or look for "shard=", and you will see the split by database.

In general yes. We commit current locations in the trep_commit_seqno table as well as persistently in the disk storage. However, there's no two-phase commits--it's more like sliding window messaging protocol. When there's a recovery we propagate recovery points backwards through the messaging flow then re-extract transactions where in doubt. Appliers are taught to discard things they have already seen.

The particular "no" would be if you had a crash on the master *and* slave disk failure simultaneously. In this case some transaction loss might occur. However, that's thankfully very rare.