Data seem sometimes to have their own life and will, and they refuse to behave as we wish.
Then, you need a firm hand to tame the wild data and turn them into quiet and obeying pets.

Thursday, June 09, 2011

Getting started with Tungsten Replicator and Tungsten Sandbox

We have been busy, at Continuent. In addition to our usual work with high performance replication, we have addressed usability issues, since we know that a hard-to-use problem, no matter how powerful, has low adoption. Thus, is with some personal satisfaction that I can announce the release of Tungsten Replicator 2.0.3, which comes with several huge improvements in matter of user friendliness. The new installation procedure is so user friendly, in fact, that I was able to build a sophisticated tungsten-sandbox with a 150-line shell script. (The corresponding features for MySQL Sandbox required 4,500 lines of Perl).

Enough self celebration, though. Let's get started, as the title of this post suggests, with the practical steps.

Requirements

Before we begin, there are a few requirements to meet.

You need to be on a Unix-like operating system. Our main choice is Linux. If you want to test on Mac OSX, it works, but we won't recommend it.

Java JRE must be installed. And it must be the original one, not the Open JDK. Update: The requirement against OpenJDK has been lifted. It works fine in my tests.

Ruby 1.8 must be installed. This is mainly needed during the installation phase only, but it is required nonetheless.

The user account that will install and run Tungsten must have ssh access to the other hosts involved in the cluster

The above mentioned user must have sudo access. This is only needed if you want to use Tungsten Replicator to run backups that involve root access (like xtrabackup). We may lift this requirement later, but for now you need to enable it, at least during the installation, and remove the access when you are done.

This user must also have read access to MySQL binary logs. Usually you achieve this by making sure that the binary logs are readable by users belonging to the "mysql" group, and by adding such group to your user.

There must be a MySQL users for Tungsten replication. This user must have full access to the database server, with grant option.

The MySQL server must have binary logging enabled.

If you have MySQL native replication running, you must stop it.

Getting the code and install

The code is released in the downloads section of Tungsten's home. The current recommended version is 2.0.3, but if you like to be really up to date, we also publish a list of recent builds from our build server, which you can use to have a go at the replicator. For this simple installation, I will use four servers from our server farm. The servers are named R1, R2, R3, and R4. The first good news of the new installation process is this: you need to install in one server only!. More details follow. First off, create a directory where you want to install. Use a non-root account. Just make sure that it's the same user in all the servers, and that such user can access the directory where you want to install. I am going to call this directory planet.

I have already a MySQL user named tungsten with password "mypwd" (but it can be anything you like, as long as it has the required privileges). Now we have all the components. If you have read the Tungsten documentation, please ignore the ./configure script. That is left for compatibility reasons, and will be deprecated soon. Instead, to install the cluster of our 4 servers, let's do the following:

Some comment on this command: --master-slave is the installation mode (see below for more info). --service-name can be anything you want. --home-directory is where all the installation sub directories will go. --cluster-hosts is the list of servers you want to install, and finally, --master-host is the host that will be installed as a master, while all the others will be slaves of that one. If you have followed the instructions carefully, the installer will bring up the Tungsten cluster without any fuss, Unix style. If you hate silent installations, you can get the full monty by adding some options:

If you run the installer in verbose mode, you will see an extremely long list of validation checks that the installed does on your current servers and on the ones that are listed in the --cluster-hosts option. If everything went well, you will find the following directories in $HOME/planet (for all servers in your cluster):

configs, containing the configuration file created by the installer. This file describes your cluster

releases, containing the Tungsten binaries.

thl, containing Tungsten's Transaction History Logs. These logs are like MySQL binary logs, but with much more metadata, including a global transaction ID, which is missing in MySQL native replication.

relay, which should be empty, unless you install in "direct" mode (see below.)

tungsten, which is a symlink to the Tungsten directory inside releases.

In addition to the above mentioned directories, Tungsten Replicator creates a database for each service. Since we have only one service in this topology, you will find a database named "tungsten_dragon". (If you have called your service "bunny", you will instead find "tungsten_bunny"). Inside this database there is the replication metadata necessary for making the servers fault tolerant. Only a small amount of data is kept on that database. It's roughly corresponding to what you get from the .info files in MySQL native replication. To test that the system is OK, let's find our tools. The first one is trepctl, which, among other things, can give us an overview of the running services.

After the installation, trepctl reported the last applied sequence number (appliedLastSeqno) as 0. Following the execution of two commands in the master, such number became 2. If you want to know more of what was happening, you can use the thl command. This corresponds roughly to using mysqlbinlog with MySQL native replication logs.

Once we are satisfied that replication is working, we can clean up the cluster and try other installation experiments. To clean up a cluster, you need to do the following:

stop the replicator in all servers.for N in 1 2 3 4; do $PWD/tungsten/tungsten-replicator/bin/replicator stop; done

remove the thl files from all servers.

remove the tungsten_SERVICE_NAME database from all mysql servers

run a "reset master" in the master database

remove the directories created by the installer in all servers

Installation types

The procedure described above was, until a few months ago, the only thing you could do with Tungsten. Now you can broaden your horizons with a wider range of possibilities.
Master/slave is of course the main option, and it's the one that you have seen in the previous section. This method gives you the full set of Tungsten features and performance. It is the recommended method for production use and for benchmarking. In this scenario, the Tungsten replicator on the master will extract transactions from the binary log, transfer them to the THL, and share it with the slaves. The slaves will read from the THL and apply the transactions to the database. There are a few steps more in between, but for the sake of brevity I will skip them You can have a look at Robert Hodges blog for more info.
Slave "direct" is the alternative that you can use in production, and it's been designed to satisfy users who only want some particular benefits on the slave side, and don't care about global transaction IDs. If you are looking at parallel apply, this is probably a setup that you want to try. In this scenario, there is no replicator on the master. The slave pulls data remotely from the binary logs, copies them locally, and extracts data to the THL. Here's an example of how to start a slave-direct system:

If your purpose is testing Tungsten, probably the Tungsten Sandbox is what you should try. This system is based on MySQL Sandbox, a framework that lets you install more than one MySQL server in the same host. Building on top of MySQL Sandbox, and leveraging the new flexibility in Tungsten installer, tungsten-sandbox allows you to build a master/slave system inside a single host. Let's give it a try. You need to have MySQL Sandbox installed, and at least one MySQL tarball expanded under $HOME/opt/mysql/X.X.XX (where X.X.XX is the MySQL version, such as 5.5.12).

cd $HOME/planet
mkdir sb
cd tungsten-replicator-2.0.3
wget http://tungsten-replicator.googlecode.com/files/tungsten-sandbox
./tungsten-sandbox -h
USAGE: ./tungsten-sandbox [flags] args
flags:
-n,--nodes: how many nodes to install (default: 3)
-m,--mysql_version: which MySQL version to use (default: '5.1.56')
-t,--tungsten_base: where to install the sandbox (default: '/home/tungsten/tsb2')
-d,--group_dir: sandbox group directory name (default: 'tr_dbs')
-s,--service: how the service is named (default: 'tsandbox')
-P,--base_port: port base for MySQL sandbox nodes (default: 710)
-l,--thl_port: port for the THL service (default: 1211)
-r,--rmi_port: port for the RMI service (default: 1010)
-v,--[no]version: show Tungsten sandbox version (default: false)
-h,--[no]help: show Tungsten sandbox help (default: false)

In my server, I have already expanded MySQL 5.5.10, and I want to install inside $HOME/tsb. So, here is what I do:

./tungsten-sandbox -m 5.5.10 -t ~/tsb

This command installs three instances of MySQL under $HOME/sandboxes and three of Tungsten under $HOME/tsb. Inside this directory, in addition to the running instances, we find some more goodies:

This script creates two services (Castor and Pollux), with only one instance of Tungsten replicator, with all the servers (MySQL and Tungsten ones) in the same host.

Conclusions

There should be much more to say, but I will leave it for the coming days. In the meantime, I encourage everyone to try the new Tungsten and submit bug reports when things don't work as expected. As always, happy hacking!
P.S. Today at 10am PT there is a webinar on this very topic!

@Ryan,Thanks for trying!Indeed, shortly after posting this article, I started testing with OpenJDK, and I haven't found any reason why it shouldn't work with Tungsten.The installer validator no longer complains about it.

For EC2 folks using the Amazon EMI, create a symlink of /etc/system-release at /etc/redhat-release (sudo ln -s /etc/system-release /etc/redhat-release); else the installer will complain about unknown linux architecture.