pt-online-schema-change

This is one of many perl script tools from Percona that makes life easier administrating, monitoring and testing mysql, this one solves the need to alter a table (adding a key, column, partitioning, etc…) on a production system that can’t be taken down. Many of the current Percona tools came from maatkit, developed by Baron Schwartz, this particular one from my understanding was initiated by the Facebook development team for the myriad of db servers they administrate, however, there seems to be two distinct OSC products, the one we are going to talk about is pt-online-schema-change.

The description above will tell you the basics of how it works, essentially an empty altered copy of the table is created, then filled by selecting data from from the active table in manageable chunks as a precaution to overloading a busy server. Triggers on the active table are temporarily created to keep the altered copy up to date with real time changes. You may question how the trigger can update a row on the new table that may not be there yet. This is done by using ‘replace’ (an insert or delete and insert) for all insert, update and delete statements in the triggers. Inserts are of course inserted, updates are now inserts on the new table (or delete and inserts if the row is already there), and deletes do the delete portion of the replace and have nothing to insert after the delete. Now to avoid conflict with the rows that are being selected and inserted from the active table, an ‘ignore’ flag in used on those bulk inserts to leave already inserted rows from the trigger as is. once the copy is completed, an atomic rename of the two tables is done.

safety first

The are many safeguards to ensure your data doesn’t get screwed up or server overloaded, among them options are the ability to control the chunk size of the selects, setting a max amount of threads connected and running due to triggered queries and general load, a dry run vs. execute option which you have to explicitly specify, a rollback feature if the process is terminated at any point along the line, no dropping the old table, no swapping the table and so on. The link above has specific implementation details.

things to keep in mind (as of pt 2.1.2)

a couple of things that I’ve run across are,

1) when dropping foreign keys, you will need to specify the altered table foreign key name, which will not be the same as the active copy foreign key name. The system uses ‘_’ in front on the altered copy, so you’ll need to do the same when specifying those foreign keys.

2) As per mysql, partitioning does not support foreign keys, and you will not be able to both partition and drop foreign keys in the same operation. You’ll most likely need to run it twice, separately.

3) If you are running bin-logs with statement based replication and without tx_isolation=READ-COMMITTED or innodb_locks_unsafe_for_binlog, the ‘insert into select * from…’ for each chunk in the copying process will need a read lock on the table, and depending on what else it going on, such as backups, potentially can cause a lock wait timeout on the process and exiting without completing.

4) there can be a momentary stall while applying the switch of tables. see below for details.

production results

we successfully used pt-online-schema-change to indirectly alter a 1.7 billion row table handling 50,000 queries per second with no downtime, taking about 15 hours to complete. There was a momentary stall (a fraction of a second) resulting in a load spike on switching out the tables, although max_used_connections does not seem to corroborate query build up as the reason. We’ll take a closer look next time.

Post navigation

about me

My name is Michael Morse, I’m the principal database & performance architect for Upsight Inc., the world’s largest independent analytics and mediation provider. I work primarily with our mediation platform architecting and managing our infrastructure with tools and technologies such as Percona server/toolkit, memcached, redis, galera and Mariadb columnstore to store and serve our data in a reliable and performant manner.