Mozilla DB News, Fri 7 Sep – checksums

Well, it is getting closer and closer to the end of the 3rd quarter of 2012, and so we are starting to hunker down and make sure our goals get met. One of those goals is to start running regular checksums on our databases, so that when doing a switchover (a manual “failover” for maintenance purposes) or when taking backups we can ensure we have maintained the same level of data integrity that the master has. And, of course, monitor the output of those checksums with Nagios. So this week has seen a lot of fiddling with pt-table-checksum to get the appropriate values….without further ado, here’s the list of what the database team did this week:

Found out the hard way that slow_query_log=ON is silently ignored by MySQL, and that the syntax to turn on the slow query logs is slow_query_log=1 (even when log_output and slow_query_log_file are set properly).

Fixed the problem where a getpersonas table locked up intermittently. The table locked up because it is MyISAM, but it had a FULLTEXT index on it. As it turns out, that FULLTEXT index was replaced by Elastic Search, so there was no problem dropping the FULLTEXT index and converting the table to InnoDB.

Fixed a problem where cron was not running on one of our backup servers.

Created development and stage database environments for the Bouncer application.

Converted all the Affiliates tables that had a default character set of latin1 to have a default character set of utf8 in dev, stage and production (character_set_server=utf8 has been set on that server for a while, but it was after those tables were created).

Checked the size of the Tinderboxpushlog database after last month’s defragmentation to see how much larger it became.

Used pt-query-digest to parse 40G (about 1/3 of what we have total) of general and slow query logs from one day of Bugzilla logging to come up with some metrics for which Bugzilla queries we are going to optimize in Q4 2012.

Tuned the innodb buffer pool on an auxiliary database for Addons database cluster to fix a bug.