Wednesday, April 13, 2011

On Tuesday I presented "Automated, Non-Stop MySQL Operations and Failover" at the MySQL Conference and Expo 2011, and published the slides at SlideShare. I thought this talk was very complicated and it was not easy to understand in 45-minute session. Now slides are online so I assume attendees will be easier to understand what steps are needed for master failover and slave promotion. As I mentioned during the talk, I'm planning to release the tool (monitoring master failure, promoting slave automatically or manually, and optionally switching alive master manually) as an open source software soon. The tool does all the steps what I covered at the talk so actually you don't need to do the steps manually. If you find any missing steps that need to be covered, I'd appreciate if you point out. I'm also writing English documentation (user's guide) so please stay tuned if you're interested.

Wednesday, April 6, 2011

Sometimes I want to trace transactions that have taken long time to execute. If a transaction is holding row locks for 30 seconds, all other clients requesting locks for the same rows will be blocked for up to 30 seconds (or will get Lock Wait Timeout errors if you set innodb_lock_wait_timeout lower). This is serious.

The important step to solve such issues is identifying what kinds of queries are executed by problematic transactions. But tracking long running transactions is not as easy as tracking just slow queries. Suppose you execute the following transaction.

This transaction takes more than 10 seconds to complete, but each query finishes very quickly. So nothing is written to the slow query log. Analyzing slow query logs is not helpful in this case.

If you enable general query log, all queries including above are written. But this hurts performance (writing to general query log is serialized inside MySQL), and logging all queries makes log file size huge. Analyzing too big log files is not so fun. In addition, neither query execution time nor error code is written to the general query log. So just analyzing general query log is not enough to verify whether the above SELECT .. FOR UPDATE took 10 seconds or finished quickly.

How about analyzing statement based binary logs? Unfortunately this is not always helpful. First, uncommitted transactions are not written to the binary log. There are many cases that transactions take long time and finally do rollback. Second, SELECT statements (including SELECT ... FOR UPDATE/LOCK IN SHARE MODE) are not written to the binary log. When you use locking reads, lots of transactions will be likely to be blocked. You probably want to identify which SELECT statement has caused the problem.

You can identify that a client from 192.168.0.2 was executing "update t1 set value=100 where id=2" and was waiting for 24 seconds to get some locks. But there is no information about what queries held the lock. A client from 192.168.0.1 (1B43B4F) was running a transaction for 34 seconds, so this was probably the one holding the lock. But there is no information about why it was running for 34 seconds. In this case I want to get all queries that the 1B43B4F executed so far.

If you can modify application programs, tracing long running transactions is not so difficult. Adding below logics will help.

1. Checking transaction start time (t1)2. Pushing queries (and current timestamp) by the transaction into some in-memory queue3. Checking transaction end time (t2)4. If (t2 - t1) exceeds N seconds, printing all queries that the transaction executed5. Deleting the in-memory queue

The problem is that this approach is not feasible in many cases. Because long running transactions may be executed from every client, you have to implement tracing logic on all application servers. Modifying core database access codes and re-deplying on all application servers is not fun. If you use O/R mappers, things become more difficult. Modifying database driver programs on all application servers? I don't want to do that.

Based on the above reasons, I thought that the most practical approach is tracing slow transactions on MySQL servers, without modifying any existing program (including client programs and MySQL servers). Hopefully there is a way to do that: capturing MySQL packets and tracking transactions and queries. The above 1-5 tracing algorithm should work. I implemented the tool "MySlowTranCapture" and published at GitHub. Here is an example output.

It is easy to identify that the first transaction caused problems. The first transaction held an exclusive lock for diary_id=100 at 09:12:17.264797, and didn't release until 09:13:17.360993. All queries by the transaction completed within a millisecond, so the first transaction was probably sleeping, or took long time to do other logics (i.e. accessing to remote servers and taking long time there) before closing the transaction.

MySlowTranCapture uses libpcap to capture network packets, like tcpdump. Since libpcap loses packets sometimes, it is not guaranteed to capture 100% transactions. MySlowTranCapture approximately uses 10-30% CPU resources of mysqld process. This is not bad for short-term analyzing purposes (when I tested tcpdump, it used 140% CPU resources of mysqld, and most of CPU time seemed to be spent for writing network packets to local files). I sometimes use this tool for debugging, too. Sometimes developers execute unnecessarily many SQL statements within single transaction. MySlowTranCapture is useful to trace such transactions on development servers, by setting -t lower (i.e. 50ms).

These talks will be based on my experiences as a former MySQL consultant and our experiences from running real-world social gaming platforms (DeNA is not well-known in US yet, but our service has 2-3 billion page views per day, so it's not small). Speaking 4.5 hours is not easy for a non-native English speaker, but I'll do the best to make the sessions will be interesting and beneficial. Due to the recent disasters and ongoing rolling blackouts in Japan, most of my colleagues had to cancel the flight, and I also have to go back to Tokyo just after the UC (I booked the flight leaving at Apr 14 6:55pm),so I can't attend to some interesting events scheduled in Apr 15 or later. But it's confirmed that I can stay from Apr 10 to 14 so I look forward to seeing ex-MySQL colleagues and friends!

About Me

I am a database engineer at Facebook.
Before joining Facebook, I was a principal database and infrastructure architect at DeNA. My primary responsibility at DeNA was to make our database infrastructure more reliable, faster and more scalable. Before joining DeNA, I worked at MySQL/Sun/Oracle as a lead MySQL consultant in APAC for four years.
You can contact me on Yoshinori.Matsunobu_at_gmail.com (replace _at_ with @).