Archive for the 'Engineering' Category

I don’t think this experience is unique. It has been shared repeatedly among those starting a job as a DBA (database administrator) at a new company, especially when the organization has never had a dedicated DBA. The conversation usually goes something like this:

– “Welcome aboard <insert name here>! Here is the list of what we never had time to do and need you to complete.”

Or

– “Welcome aboard <insert name here>! As we discussed in your interview, we have some databases that are performing poorly and we need you to fix them ASAP.”

You get access and start looking at the databases, and, of course, perform your own diagnosis of the systems and uncover problems. Likely, you’ll find issues other than those brought to your attention. What do you do if you find a critical issue the business deems not as critical as the issues they want addressed? Multiple strong forces must be brought to a delicate balance. So, where do you begin? – That which is critical to your company and its business? – That which is critical to you based on your own diagnosis? – Where you can achieve the most with the least effort/time?

You need to handle what is most critical to your company first. That’s obvious. But there are times where resolving an issue you uncovered will also resolve one (or some) of the issue(s) that are on the company’s list. You MUST allocate the time and do your own analysis; it is for your good and the good of the company.

There’s a New DBA in Town.

Let’s go back in time to my first week at 3Ci. I knew some of the pain points from discussions with my manager. I found out we were beginning our busiest time of year, and in years past, the databases had not performed as well as expected. My skills and diagnosis were immediately put to the test to solve these urgent needs:

1. Manage the growth of our OLTP data. It was growing at the average rate of about 10G/day. This growth was resulting in:

–Our disks reaching capacity

–Poor query performance. Many of the highly accessed tables were unreasonably large

–Inability to perform any DDL changes without significant downtime to our systems

2. Separate the MySQL OLTP with Reporting to remove load off of the OLTP databases. There were other needs that were uncovered during my diagnosis, but they are not relevant to this discussion.

I then added my own requirement:

3. Automate as much as possible. Automation removes the risk of manual error. Also, it would free me up to work on more exciting projects!

How I Used MySQL as the Solution — the Details…

My investigation into the data in our OLTP system had highlighted that the tables growing the quickest were log-based. I did verify we had to maintain the log inserts to our databases; and that during this upcoming busy season, those inserts would increase significantly. Of course, I needed to consider the fact we would be recording more logs as our business grew. By asking the right questions, and digging deeper into the data, I learned our OLTP system would not be required to keep old logs.

Given those facts, I now had a place to start for managing the data footprint. I knew I could delete log data that was older than <insert date here>. But, was this really the best solution? The finer details could lead to some potential problems. Deleting data from Innodb tables does not reclaim space. To reclaim the newly freed space, I would need to do something else. I needed a new approach… Partitioning!

Partitioning the log-based tables based on a date field would allow me to drop partitions that have data older than <insert date here>. Dropping a partition is equivalent to deleting a file which result is reclaiming disk space.

Partitioned tables also improve query performance. If MySQL can determine which partitions to use for the SQL, it will only use those. Partitioned correctly, you could significantly reduce the amount of data MySQL has to scan. You could think of the partitions as an index on the table.

So Far So Good.

Let’s move on to the reporting server. The reporting server solution appears to be simple enough, right? Just create a slave off of the master OLTP server. Again, when you really get into the finer details, this generic solution does not completely solve the problem. Unlike the OLTP server, the Reporting server will keep log data for a longer duration.

Let’s start by creating a slave from the OLTP server. What happens to the slave when a partition is dropped on the master? When I drop the partition on the OLTP databases, that command will replicate and drop the same partition on the slave/reporting server. This would not be desired, as I would want to keep data for a longer duration on the Reporting server. To remedy this problem, I must not replicate the partition maintenance commands. Further details on how to not replicate commands are written later in this blog.

Now that we have a reporting server, I need to move reporting access and reporting tasks to this server (I will leave the solution to this challenge to the reader).

So, where do we stand with the requirements?

1. Manage growth of OLTP and increase the performance of some SQL.

2. Move the reporting load off of the OLTP server.

I needed to automate the management of the partitions on both the OLTP and Reporting Servers. Partitions will be dropped and added and I needed to automate the DROP and the ADD. In this process I must consider that each table could have its only data retention policy. Also, the policy on the OLTP server may be different from the policy on the Reporting Server.

I implemented a daily partition scheme for the log-based tables by using LIST partitioning with the TO_DAYS() function. To simplify the automation, I named each partition p{number resulting from TO_DAYS() function}. For example, the partition p734905 is for 2012-02-07.

Now, the Solution.

I created a table, partition_detail that lists each database.table and the number of partitions that will be managed automatically. A process was created that reads the partition_detail table. It then finds the corresponding partition records in the information_schema.partitions table (MySQL’s meta data). Using the partition_detail.number_partitions value and the TO_DAYS() function, the process will ADD or DROP partitions as needed. Further, I run the process on both the OLTP databases and Reporting databases using SET sql_log_bin=0 so that it will not replicate.

Now I have the process automated!

I have successfully automated these MySQL systems using MySQL Replication and Partitioning!

About the Author:

Gabe Tucker is a senior data architect at 3Cinteractive, where he is responsible for the integrity of 3Ci’s data and the performance of its data systems. Tucker has 20 years of experience as a software engineer and database administrator.