Introduction

Replication is an interesting feature of MySQL that can be used for a variety of purposes. It can help to balance server load across multiple machines, ease backups, provide a workaround for the lack of fulltext search capabilities in InnoDB, and much more.

The basic idea behind replication is to reflect the contents of one database server (this can include all databases, only some of them, or even just a few tables) to more than one instance. Usually, those instances will be running on separate machines, even though this is not technically necessary.

Traditionally, MySQL replication is based on the surprisingly simple idea of repeating the execution of all statements issued that can modify data—not SELECT—against a single master machine on other machines as well. Provided all secondary slave machines had identical data contents when the replication process began, they should automatically remain in sync. This is called Statement Based Replication (SBR).

With MySQL 5.1, Row Based Replication (RBR) was added as an alternative method for replication, targeting some of the deficiencies SBR brings with it. While at first glance it may seem superior (and more reliable), it is not a silver bullet—the pain points of RBR are simply different from those of SBR.

Even though there are certain use cases for RBR, all recipes in this chapter will be using Statement Based Replication.

While MySQL makes replication generally easy to use, it is still important to understand what happens internally to be able to know the limitations and consequences of the actions and decisions you will have to make. We assume you already have a basic understanding of replication in general, but we will still go into a few important details.

Statement Based Replication

SBR is based on a simple but effective principle: if two or more machines have the same set of data to begin with, they will remain identical if all of them execute the exact same SQL statements in the same order.

Executing all statements manually on multiple machines would be extremely tedious and impractical. SBR automates this process. In simple terms, it takes care of sending all the SQL statements that change data on one server (the master) to any number of additional instances (the slaves) over the network.

The slaves receiving this stream of modification statements execute them automatically, thereby effectively reproducing the changes the master machine made to its data originally. That way they will keep their local data files in sync with the master's.

One thing worth noting here is that the network connection between the master and its slave(s) need not be permanent. In case the link between a slave and its master fails, the slave will remember up to which point it had read the data last time and will continue from there once the network becomes available again.

In order to minimize the dependency on the network link, the slaves will retrieve the binary logs (binlogs) from the master as quickly as they can, storing them on their local disk in files called relay logs. This way, the connection, which might be some sort of dial-up link, can be terminated much sooner while executing the statements from the local relay-log asynchronously. The relay log is just a copy of the master's binlog.

The following image shows the overall architecture:

Filtering

In the image you can see that each slave may have its individual configuration on whether it executes all the statements coming in from the master, or just a selection of those. This can be helpful when you have some slaves dedicated to special tasks, where they might not need all the information from the master.

All of the binary logs have to be sent to each slave, even though it might then decide to throw away most of them. Depending on the size of the binlogs, the number of slaves and the bandwidth of the connections in between, this can be a heavy burden on the network, especially if you are replicating via wide area networks.

Even though the general idea of transferring SQL statements over the wire is rather simple, there are lots of things that can go wrong, especially because MySQL offers some configuration options that are quite counter-intuitive and lead to hard-to-find problems.

What this means is that the only filtering rules that produce intuitive results are those based on the replicate-do-table and replicate-ignore-table configuration options. This includes those variants with wildcards, but specifically excludes the all-database options like replicate-do-db and replicate-ignore-db. These directives are applied on the slave side on all incoming relay logs.

The master-side binlog-do-* and binlog-ignore-* configuration directives influence which statements are sent to the binlog and which are not. We strongly recommend against using them, because apart from hard-to-predict results they will make the binlogs undesirable for server backup and restore. They are often of limited use anyway as they do not allow individual configurations per slave but apply to all of them.

Setting up automatically updated slaves of a server based on a SQL dump

In this recipe, we will show you how to prepare a dump file of a MySQL master server and use it to set up one or more replication slaves. These will automatically be updated with changes made on the master server over the network.

Getting ready

You will need a running MySQL master database server that will act as the replication master and at least one more server to act as a replication slave. This needs to be a separate MySQL instance with its own data directory and configuration. It can reside on the same machine if you just want to try this out. In practice, a second machine is recommended because this technique's very goal is to distribute data across multiple pieces of hardware, not place an even higher burden on a single one.

For production systems you should pick a time to do this when there is a lighter load on the master machine, often during the night when there are less users accessing the system. Taking the SQL dump uses some extra resources, but unless your server is maxed out already, the performance impact usually is not a serious problem. Exactly how long the dump will take depends mostly on the amount of data and speed of the I/O subsystem.

You will need an administrative operating system account on the master and the slave servers to edit the MySQL server configuration files on both of them. Moreover, an administrative MySQL database user is required to set up replication.

We will just replicate a single database called sakila in this example.

Replicating more than one databaseIn case you want to replicate more than one schema, just add their names to the commands shown below. To replicate all of them, just leave out any database name from the command line.

How to do it...

At the operating system level, connect to the master machine and open the MySQL configuration file with a text editor. Usually it is called my.ini on Windows and my.cnf on other operating systems.

On the master machine, make sure the following entries are present and add them to the [mysqld] section if not already there:

server-id=1000log-bin=master-bin

If one or both entries already exist, do not change them but simply note their values. The log-bin setting need not have a value, but can stand alone as well.

Restart the master server if you need to modify the configuration.

Create a user account on the master that can be used by the slaves to connect:

How it works...

Some of the instructions discussed in the previous section are to make sure that both master and slave are configured with different server-id settings. This is of paramount importance for a successful replication setup. If you fail to provide unique server-id values to all your server instances, you might see strange replication errors that are hard to debug.

Moreover, the master must be configured to write binlogs—a record of all statements manipulating data (this is what the slaves will receive).

Before taking a full content dump of the sakila demo database, we create a user account for the slaves to use. This needs the REPLICATION SLAVE privilege.

Then a data dump is created with the mysqldump command line tool. Notice the provided parameters --master-data and --single-transaction. The former is needed to have mysqldump include information about the precise moment the dump was created in the resulting output. The latter parameter is important when using InnoDB tables, because only then will the dump be created based on a transactional snapshot of the data. Without it, statements changing data while the tool was running could lead to an inconsistent dump.

The output of the command is redirected to the /tmp/sakila_master.sql file. As the sakila database is not very big, you should not see any problems. However, if you apply this recipe to larger databases, make sure you send the data to a volume with sufficient free disk space—the SQL dump can become quite large. To save space here, you may optionally pipe the output through gzip or bzip2 at the cost of a higher CPU load on both the master and the slaves, because they will need to unpack the dump before they can load it, of course.

If you open the uncompressed dump file with an editor, you will see a line with a CHANGE MASTER TO statement. This is what --master-data is for. Once the file is imported on a slave, it will know at which point in time (well, rather at which binlog position) this dump was taken. Everything that happened on the master after that needs to be replicated.

Finally, we configure that slave to use the credentials set up on the master before to connect and then start the replication. Notice that the CHANGE MASTER TO statement used for that does not include the information about the log positions or file names because that was already taken from the dump file just read in.

From here on the slave will go ahead and record all SQL statements sent from the master, store them in its relay logs, and then execute them against the local data set.

This recipe is very important because the following recipes are based on this! So in case you have not fully understood the above steps yet, we recommend you go through them again, before trying out more complicated setups.

Setting up automatically updated slaves of a selection of tables based on a SQL dump

Often you might not need to replicate everything, but only a subset of tables in a database. MySQL allows exercising fine-grained control over what to replicate and what to ignore. Unfortunately, the configuration settings are not as obvious as they might seem at first glance.

In this recipe, you will see how to replicate only a few select tables from a database.

Getting ready

The setup for this recipe is the same as for the previous one, Setting up automatically updated slaves of a server based on a SQL dump. Only the configuration options on the slave need to be changed. So instead of repeating everything here, we just present the important differences.

How to do it...

Follow the steps of the previous recipe up to the point where the mysqldump tool is used to extract the initial data set from the master. Use this command instead:

Continue with the rest of the instructions as in the Setting up automatically updated slaves of a server based on a SQL dump recipe.

How it works...

The SQL dump file taken on the master is limited to three tables: address, country, and city. The slave's configuration also tells it to only execute statements coming from the master that targets one of these three tables (replicate-do-table directives), while overtly ignoring any other changes in the sakila database (replicate-wild-ignore-table). Even though all other statements are still retrieved from the master and stored temporarily in the relay log files on the slave, only those with modifications to one of the three tables explicitly configured are actually run. The rest are discarded.

You can choose any subset of tables, but you need to make sure to take Foreign key relationships between tables into account. In this example, the address table has a reference to the city table via the city_id column, while city in turn has a relationship with country. If you were to exclude either one of the latter and your storage engine on the slave was InnoDB, replication would break because of Foreign key violations when trying to insert an address, since its dependencies were not fulfilled.

MySQL does not help you in this respect; you must make sure to identify all tables and their relationships manually before setting up the replication.

Setting up automatically updated slaves using data file copy

Even though replication is designed to keep your data in sync, circumstances might require you to set up slaves afresh. One such scenario might be severely changing the master data, making replication too expensive. Using a SQL dump to re-initialize the slaves might be too time-consuming, depending on the size of the data set and the power of the slave machines.

In cases where master and slave databases are the same size anyway (meaning, you do not have filters in place to sync data only partially) and if you can afford a downtime on the master database, there is another way of providing slaves with a fresh starting point: copying the master's data files to the slave.

Beware that this approach will lose all data that was changed on the slave alone. So make sure that this is what you want!

Getting ready

To follow along with this recipe you will need privileges to shut down both master and slave MySQL instances and access the data and log directories on both machines. Depending on the size of your database you will have to judge which method of copying will be the best between the machines. If both are part of a local area network, copying via a shared drive or something like FTP will probably be the fastest way. You might, however, need to resort to other means of data transfer like external hard disks or the like, when only limited bandwidth is available.

Moreover, you will need administrative MySQL user accounts on both sides to execute the necessary statements to control replication.

How to do it...

Open the master's configuration file with a text editor. Locate the line controlling the name of the binlog files. It is located in the [mysqld] section and will look similar to this:

[mysql]...log-bin=master-bin...

Change the value of that setting to a different name. In this example, we will use log-bin=new-master-bin. This will cause the master MySQL server to start with a new sequence of binlogs upon its next launch, making a convenient starting point for the replication.

Shut down the master database.

Navigate to the MySQL data directory. The exact location can be found in the MySQL configuration file. Make sure to find both InnoDB data and log locations.

Optionally, copy data and log files to another location locally on the master. This might be faster than copying via network or USB drives, and allows for a quick restart of the master. If you do this, use this temporary location to copy the data in the next step.

Copy the data to the slave machine. We recommend a temporary target location on the slave because this allows the slave to continue running for the time the copying takes. Unless you want to bring along all the user accounts and privilege information from the master to the slaves, you should exclude the mysql folder from the data directory. You also need not copy the binary logs (in this example called master-bin.*).

Stop the slave server.

You can restart the master once the original data and transaction log files have been copied. Make sure it starts with a new sequence of binlogs called new-master-bin.000001.

Write down the names and sizes of InnoDB data and log files you copied to the slave. These will have to be entered into the slave's configuration because, otherwise, InnoDB will not start up. Also, pay attention to an autoextend option, which the last of the data files might have attached. Make sure you carry over this option, should it be there on the master. You can also take these values from the master's configuration file, of course.

Replace the original slave data and log files with those of the master. Make sure you keep the mysql database directory if you decided not to copy it from the master.

Make sure you delete the master.info file and any relay-logs from the slave—those do not match the current state anymore and would cause trouble when the slave is restarted.

Edit the slave's configuration file to match the names and sizes of the data files you wrote down a minute ago.

Start the slave server again. It should come up without any problems using the new data files. If not, make sure you got everything right in the config file regarding names and sizes of the data files.

Re-initialize the replication on the slave. This is rather easy because we altered the master's configuration to log any changes that occurred after the snapshot copy was taken to a new series of binlog files. Fill in the appropriate host name, user, and password values for your master:

How it works...

The principle of this recipe is very simple: replication needs a common starting point on master and slave. What could be better than a 1:1 copy of the original master's data? As the master is shut down during the process, no more writes can happen. Configuring it to start with a new binlog file on its next start makes it trivial to point the slave to the right position because it is right at the new file's beginning.

If you cannot change the master binlogs' file names, the process is slightly more complicated. First you need to make sure nobody can modify any data for a short period of time. You do so with a FLUSH TABLES WITH READ LOCK; statement. Then issue a SHOW MASTER STATUS; and note the values. Now, without closing the client connection or releasing the lock, shut down the master server. Only if the lock is kept while shutting down the master can you be sure no write operations take place and invalidate the binlog position you just gathered.

Copy the data and transaction log files as described above. The remaining steps are the same, except of course, when you issue the CHANGE MASTER TO on the slave. Here you need to insert the MASTER_LOG_FILE and MASTER_LOG_POS you got from SHOW MASTER STATUS.

There's more...

The steps described above require you to take down both master and slave databases, albeit not necessarily at the same time. Nevertheless, this might not be an option if you are dealing with a production system that cannot be shut down easily.

In these cases, you have some other options that are, however, not explained in detail here.

Conserving data file by using LVM snapshots

If your data and log files are stored on a logical volume managed by LVM, you can use its snapshot feature to conserve the data files' state once you have got the SHOW MASTER STATUS information. As soon as the snapshot has been taken, you can release the lock again and proceed as described above, copying not the most current version but the snapshot files. Be advised, however, that this approach might take a significant hit on the I/O performance of you master!

Backing up data using Percona xtrabackup

At the time of writing, an open-source alternative to the commercially available innobackup tool (available from http://www.innodb.com/products/hot-backup/) is under active development. While being primarily a backup tool that allows backing up InnoDB databases while the server is up and running, the documentation contains a (currently empty) section on setting up a slave from a backup in replication. Experience tells that Percona—the company behind xtrabackup—is very engaged in the MySQL ecosystem and might very well have completed its set of instructions by the time you read this. To check on the current status of the project go to https://launchpad.net/percona-xtrabackup.

Sharing read load across multiple machines

Often you have a very unequal distribution of read and write operations on a database. Websites usually get many more visitors just browsing and reading contents than actually contributing contents. This results in the database server being mainly busy reading information instead of adding or modifying existing material.

Replication can be used to alleviate scalability issues when your site reaches a certain size and a single machine might reach the limits of its performance reserves.

Unfortunately, MySQL does not offer this load-balancing functionality itself, so you will need to take appropriate actions on the application level.

In this recipe, we will show you the general procedure to follow when sharing read accesses between two slave machines while still aiming writes at the master. Beware that due to the asynchronous nature of MySQL replication, your application must be able to handle slightly out-of-date results because issuing an INSERT, UPDATE, or DELETE against the master will not mean that you can read the modified data back immediately as the slave might take some time to catch up. Usually, on a local network this should be a couple of seconds at most, but nevertheless the application code must be ready for that.

To simplify the scheme, you should design your application to exclusively read from the slaves and only use the master for modifications. This brings the additional benefit of being able to keep the overall system up and running while switching to a read-only mode temporarily, backing up the master server. This is not part of this recipe, however.

The example used in this recipe uses three database servers. The sample application is written in Java, using the MySQL Connector/J JDBC database driver. Depending on what application platform you are using, syntax and function names will differ, but the general principle is language independent.

The source code shown later has been abbreviated to show only the most relevant portions. You can find the complete file on the website.

Getting ready

Depending on your application infrastructure, you will need privileges to change its database connectivity configuration and the source code. This is usually a task that requires cooperation with application developers.

To follow along with this example you should be familiar with the Java language and its basic constructs.

Moreover, you will need three MySQL servers—one configured as the master and two others as slaves. They will be referred to as master, slave1, and slave2 in this example. Substitute your concrete host names appropriately.

You will also need the Java Standard Edition development tools available from http://java.sun.com, and the MySQL Connector/JDBC driver available from http://dev.mysql.com. Download and install both if you do not already have them.

How to do it...

Download the file called MySQLBalancingDemo.java from the website. It contains the following code:

How it works...

You just compiled and ran a small program that demonstrates round-robin load balancing.

The first line of output is the master's server-ID setting, because the first connection was not set to read only. The connection is then declared to be targeted at the slaves via setReadOnly(true). The next query will then return the server ID of the particular slave it was balanced to. You might need to run the demo a few times to see a different slave being used because the algorithm that balances the load does not strictly toggle each time, but might direct a few connections against the same slave.

There's more...

While the JDBC driver makes it relatively easy to use read load balancing across several slaves, it only helps you take the first step on the way. You must take care that the application knows which connection to use for write operations and which for read. It must also cope with slaves and master possibly being slightly out of sync all the time. Concentrating that this special logic in a class of its own, is advisable to limit the effect on the rest of the application.

Working with connection pools

When working with connection pooling, be sure to initialize any connection you get to the correct mode using the setReadOnly() method, to be sure you know what state it is in. You might be handed a connection that was set to the wrong mode when it was put back into the pool.

Working on other programming environments

In development environments not using Java, you might have to take care of managing the cycling between slaves yourself. Independent of the actual language or environment you are using, a good practice is to channel all database operations through a set of functions or methods that centrally manage the balancing. You could provide functions that handle INSERT, UPDATE, and DELETE operations, always connecting to the master and a SELECT function going to the slaves for data reads.

In case you need to select something back that you just wrote and cannot allow for the replication lag, you might also provide a read function querying the master machine. You should use this sparingly, however, because it definitely counteracts the intention of relieving the master from the read load.

Considering efficiency while adding slaves

Of course, the slaves have to perform write operations as well to keep up with the master. This means their performance is not fully available for reads, limiting scalability.

So adding more slaves does not proportionally improve performance of the overall system.

Alerts & Offers

Series & Level

We understand your time is important. Uniquely amongst the major publishers, we seek to develop and publish the broadest range of learning and information products on each technology. Every Packt product delivers a specific learning pathway, broadly defined by the Series type. This structured approach enables you to select the pathway which best suits your knowledge level, learning style and task objectives.

Learning

As a new user, these step-by-step tutorial guides will give you all the practical skills necessary to become competent and efficient.

Beginner's Guide

Friendly, informal tutorials that provide a practical introduction using examples, activities, and challenges.

Essentials

Fast paced, concentrated introductions showing the quickest way to put the tool to work in the real world.

Cookbook

A collection of practical self-contained recipes that all users of the technology will find useful for building more powerful and reliable systems.

Blueprints

Guides you through the most common types of project you'll encounter, giving you end-to-end guidance on how to build your specific solution quickly and reliably.

Mastering

Take your skills to the next level with advanced tutorials that will give you confidence to master the tool's most powerful features.

Starting

Accessible to readers adopting the topic, these titles get you into the tool or technology so that you can become an effective user.

Progressing

Building on core skills you already have, these titles share solutions and expertise so you become a highly productive power user.