Friday, November 13, 2009

MySQL Cluster has come a long way in the 4 years since I experimented with it. Compared to when I first got the cluster working on my home computer, I didn't have to change much get the latest version up and running.

So, what is MySQL Cluster? It is a database solution which tries to solve high availability issues by using multiple synchronous masters (for the lack of a better phrase) in a shared-nothing environment.

In order to solve a lot of the latency issues associated with multiple masters, it keeps all the indexed data in memory to enable fast processing of the data.

So then what is the purpose of this article? To let you setup a MySQL Cluster for fun, on a single box, so you can test out its features1. The main things I wanted to test out are:

Setting up a cluster on one or two boxes. People tell me this is impossible. For personal and business reasons, this article attempts to squash those rumors so I can just point people to this article instead of explaining the details to them. There are advantages to setting up a cluster on 3 boxes as compared to 2 boxes, however. There is a difference between a network outage and a single server becoming unavailable and 3 servers help with this.

Using other storage engines and setup dangerous replication between the frontend MySQL servers.

Replicating data to a cluster slave as well as a normal slave.

There is also one important item to keep in mind about MySQL Cluster. I have no idea why they make "MySQL Cluster" as a separate product from the main MySQL server.

In the past, the "MySQL Cluster" was just the NDB engine running in MySQL. It seems to be the same way even now and I suspect that it might just be due to politics and marketing that they have something called a "MySQL Cluster" when it is really just MySQL with the NDB engine.

The mysqld server running in "MySQL Cluster" can still store data in InnoDB and MyISAM formats.

Row Level Replication and why it frustrates me

Please read Comparison of Statement-Based and Row-Based Replication. MySQL copies data from one computer to another computer through MySQL Replication. The old way of replicating data was to copy over and execute all the SQL commands from one MySQL server to the other MySQL servers. This type of replication was called "Statement Level Replication.". This copying and executing of queries was NOT real-time and had some issues. Basically, it was like this:

Query is executed on the master. Let's say the query took 30 seconds to finish.

Query is saved to a log and then each slave server downloads the query. There is a time delay in downloading the query. Usually it isn't very much.

Query gets executed on the slave servers. From the point the slaves start to execute the query, they will take an equal amount of time to execute the query as the master did. If the master took 30 seconds to execute an insert, update, or delete command, it will take 30 seconds for each slave to do the same (usually). Thus, right before a slave finishes the 30 second query, it is behind in replication by 30 seconds, or rather its data won't be in sync until 30 seconds later (after the query is finished).

For most practical purposes, most small queries finish very fast and replication is also very fast.

So what is Row level Replication? Basically, the SQL query which executes the changes on the master is forgotten and we only record the actual row differences and copy them over.

For example, let us say that a query updates a table and sets the country code to 'US' for all entries that are NULL.

In a table with 1 million rows, let's say only 10 rows were NULL and thus 10 rows get updated. The difference between Statement and Row Level Replication is the following:

Statement Replication will copy the query and execute it.

Row Replication will copy only the changes for those 10 rows and ignore what the original query is.

Row Level Replication doesn't care what the SQL query was. It only looks at the data that has changed and copies those changes to the slaves.

Why does this frustrate me? For a few reasons:

Everything is moving towards Row Level Replication and Statement Level Replication is being made incompatible under some conditions. I think both are very powerful and you should let the end user decide what they want to use. I understand that Statement Level Replication can cause problems (under some circumstances), so my attitude is that they should work around the problems. The power retained by keeping both as an option at all times is the ability to customize how you do things. I have been in many situations where a slight improvement in performance meant a big deal. Thus, you want all your options open.

Also, Row Level Replication is MUCH HARDER to debug. Statement Level Replication is very easy to debug. I have seen weird replication issues with MySQL in the past and I don't like to be in a situation where I can't debug things easily. Everything about being a DBA is debugging database problems and finding out where the problem is. All the advantages of Row Level Replication are almost irrelevant if you can't debug issues in replication.

I am glad we have Row Level Replication as an option and it can be nice to use depending on your needs. I just wish Statement Level Replication could be used if handled correctly under all all configurations.

To give you a better perspective of why Statement Level Replication is nice to debug, we had a weird case one time where 1 byte in a replication stream would change for no reason over a WAN. If it changed in the data, nobody would notice (that's really bad). But if it changed in one of the SQL words, replication would break.

We were easily able to compare the master's binlogs to the slave's binlogs and find out what the problem was.

I remain sceptical how easy it will be to verify why things break with row level replication. Maybe I am just paranoid, but you have to be when managing databases.

Misc Setup

I have two systems I am using to test MySQL Cluster on.

First computer: Laptop with 3 GB of RAM running Ubuntu.

Second computer: Desktop with 2 GB of RAM running Ubuntu 9 and also using a Vortex SSD hard drive. I have an external drive enclosure so I could test the SSD hard drive on other systems.

Third system: 2 GB server with CentOS.

Some things to NOTE:

It's interesting to note I only get a max of 30 MB/sec transfer rate from my laptop hard drive to the Vortex drive. I think it's because of the speed of USB and also the internal hard drive. I've read that USB 2.0 maxes out at 40 MB/sec. In theory, if I attach the Vortex drive directly to the motherboard on my desktop computer, it should go a lot faster. In case you wonder why I am mentioning this, at the time of this article, the Intel X25-e SSD drive looks very promising in regard to database use. I expect that 90% of the databases using MySQL will be using SSD drives in the next few years, because most MySQL servers have less than 100 GB of data. Also, the SSD hard drives will get faster and bigger. I bought the Vortex because it was cheaper and I just wanted to test SSD out in general and later use it for my laptop. If it works out well, I'll buy the Intel X25-e and put it in my desktop and beat the daylights out of it running database simulations.

Also, when I connected the Vortex SSD hard drive to the motherboard on my desktop, I could only get about 35 MB/sec written to the SSD hard drive when reading files from cache. It seems to go faster than the new 1.5 TB SATA hard drive I bought though, which was about 23 MB/sec if I remember right.

Have a Gb free of diskspace. You don't need all of that, maybe 300 MB at most.

Of course, you should have Python installed. You don't really need Python installed for anything in this article, but you should have it installed anyways. (353, 413, 409, 521, 482)

Also, a management node doesn't do anything most of the time, so if you use a separate server for the management node, it is a waste.

For each IP address, bind a data node, mysqld node, and management node to its default ports. Each IP address will have its own set of nodes.

One management node will be a backup management node.

Define NDB databases with a prefix to make it easy to separate NDB databases from InnoDB and MyISAM databases.

Setup replication between the MySQL servers on 127.0.0.101 and 127.0.0.102 in dual master mode excluding replication of the NDB databases (the cluster takes care of the NDB databases).

Setup replication to a cluster slave and a normal slave using Row Level Replication. You could do Statement Level Replication, but then all writes would have to go to exactly one of the mysqld servers in the Cluster.

Replication Type

Row Level
Replication

Special
Rep Config

Hacks
necessary

Problems

Notes

Weird Purpose

Cluster

Other
Engines

MySQLD
Master
Replication

Slave

Cluster
Slave

1.

•

Basic configuration.

2.

•

•

1

Non-ndb data is split across mysqld nodes

1

3.

•

•

•

Y

Y

Y

2

Can use other storage engines.

1

5.

•

•

•

•

Y

Y

Y

2

Convert tables to non-NDB tables on Slave.

1,2

6.

•

•

Y

Standard Cluster Replication.

7.

•

•

•

•

Y

Y

Y

2

Cluster Replication with non-NDB tables.

1

8.

•

•

•

•

•

Y

Y

Y

2

Convert replication to non-NDB tables on Slave.

1,2

Special Rep Config: The MySQLd nodes need to replicate non-NDB data.

Hacks Necessary: The stored procedure I use is needed to start replication if a MySQLd node gets restarted.

Problems:

Inconsistent data on non-NDB tables.

Replication configurations are needed to avoid duplicate entries in the binlogs. Non-NDB data replicated on the MySQLd servers will need to be setup in a Replication Circle.

Weird Purposes:

Keep log data out of cluster to keep the data small in the cluster.

Use Read-Repeatable mode in InnoDB on the normal slave. This can be useful for *snapshot* reports.

Configuration Files

Place all of these files in "/etc/mysql". I make NO APOLOGIES for the configuration of these files. I haven't finished tuning the configurations yet. I just did the bare minimum to get it to work. Use at your own risk.

There is one "config.ini" file which is needed by all the servers. It is here: config.ini.txt.

Download and Install MySQL

It should be safe to download and untar them in the working source directory of MySQL right after you have downloaded and untarred the source code for MySQL.
Note: The download steps below only work for a specific version of MySQL. Please refer to http://dev.mysql.com/downloads/cluster for newer versions.

Use my install script to install MySQL. You can read the various comments in the script. Please read the comments before you execute each section in the script. In general, it will attempt to setup three things:

A MySQL Cluster

A regular slave

A MySQL cluster slave

No replication will be setup however. This install script downloads, compiles, and sets up the basic environment. To execute this script:

You must have sudo access. If not, just log in as the root user. For example: su -l root

Download the script.

mv install.txt install.sh

bash install.sh # However, I recommend only doing sections at a time. I got the script to work all the way through, but this script doesn't allow for errors, so it is safer to just do sections at a time. Copy and paste sections into a terminal session.

Setting up and Testing Replication to a Slave and Cluster Slave

We assume the MySQL cluster and the slaves are running. Replication is not setup yet, but it will be. Now we will setup replication where each of the two MySQL nodes will replicate non-NDB tables to each other.

Also, we will setup replication from the first MySQL node to a MySQL slave and the 2nd MySQL node to the cluster slave. The important thing to note is that the slaves will replicate all the data from all databases.

NOTE: We only use Row Level Replication. It is possible to do this with Statement Level Replication, but it gets more complicated.

Add the following to the config files if it is not already done: "event_scheduler=ON". The config files in this article already have that setting. If you had to add this to your config, restart the MySQLd servers.

"Show slave status" should have both the io_thread and sql_thread as 'Yes' after the stored procedure kicks in.

Conclusion and Future Topics

Well, there's nothing much to conclude except that you can setup a MySQL Cluster on a single box. It works for Ubuntu and CentOS pretty well.

Here are some future topics:

Stress test MySQL Cluster on a single box and get it to break. Hint: The config files in this article were made so that I could load real production data on a server at work without having the cluster kick out the data nodes.

Break MySQL Cluster by pounding it to death or filling up diskspace or filling up the index in memory and see what happens.

Split the configurations out to two servers.

Setup Statement Level Replication.

Perform backups via cluster or slave.

Managing the cluster and Nagios alarms for the cluster to detect when things go bad.

Adding and dropping nodes manually.

Creating Ubuntu and CentOS packages.

Testing my SSD drive.

Some SSD Preliminary Testing

I ran a quick test on the performance of my SSD drive compared to the brand new SATA drive that I bought.

However, it is not a very good test because my computer is 4.5 years old. The important thing to keep in mind is that SSD drives perform much better with random writes and reads.

They are somewhat better for sequential writes and reads. For a database server, random writes and reads are the most important.

When you look at the tables below, you might except the sequential copying of cds should go about as fast as sequential inserts into MySQL Cluster. However, even sequential inserts, are not exactly sequential in all respects.

As I expected, straight inserts went a lot faster on the SSD drive and looked closer to random write speed improvements.

I loaded data onto both the SSD and the SATA drives I have. I have separate MySQL Cluster installations on each hard drive.

My computer is from late 2004, so it probably only a SATA I bus. So this is a really bad test for testing real performance, but I just wanted to get a rough idea of what I am dealing with. Both the SATA drive and the SSD drive were brand new.

I was using an SSD Vertex 60 gig drive. I loaded a 200 MB file onto the system, but, there were 4 MySQL instances running.

The two data nodes for the cluster, a slave, and a cluster slave. In addition, the binlogs would have a copy of all the data for the MySQLd instances.

Thus, there are 8 copies of the data (4 in the databases, 4 in logs). I expect on a system with just one MySQL instance it would run a lot faster.

For cdrom copying times of 700 megs, look at this table. The SSD drives appears to be about 66% faster. However, when you compare this to the next table, SSD shows to be even better (as expected).

Cd copy count

SATA seconds

SSD seconds

1

28

20

2

29

19

3

34

20

4

32

19

5

30

19

6

32

20

7

32

18

8

32

20

We can see that the MySQL Cluster is running 2 to 5 times faster on the SSD drive. This is not surprising if the latency on the SSD drive is better.

The inserts below were at 1000 rows at a time. The python script has the detailed schema information. I compared 3 different points in the log files for the inserts.

I compared the recent time for 1000 rows being inserted and the total time up to that point. The loading stopped at 1 million rows.

As a side note, SLC SSD drives last 10 times longer for writes comapared to the MLC SSD drives, from what I understand.

Thus, the Intel x25-e is far more important than the Intel x25-m for database servers. You want the drives to last as long as possible and have the best random write performance.

On one of my contracting jobs, because I showed them how the Vortex behaved on my home computer, I will get to test an Intel x25-e at their place.

That's the second time when I invested in my computer at home it helped me in my career. The first time was getting an AMD 64 bit system before most companies had it and using it at home with Linux and MySQL.

The ex complained it was a waste of money, but it was a hell of an impressive thing to say at the time during interviews that I had 64 bit Linux and MySQL running at home, and it proved to be true.

If you get a good raise because you show initiative when investing money into your home equipment to prove something, it pays for itself.

1 comment:

Not so far I have found new cool tool to work with mySQL on linux - Valentina Studio. Its free edition can do things more than many commercial tools!!I very recommend check it. http://www.valentina-db.com/en/valentina-studio-overview