Usual outage today (which happens every Tuesday for mysql database compression/backup). It went really long - I guess we've been busy inserting/deleting all last week. We went back to an older policy of doing simultaneous compression on both the master and replica, which should vastly speed up post-outage recovery. Until today we've been letting the compression commands (i.e. "alter table user type = innodb") to pass from the master to replica via the usual channels, but they wouldn't happen in parallel (as the loooong queries had to complete successfully on the master before the replica would start processing them). This caused the replica to be as many as four hours behind when the project started up again in the afternoon. The benefit of doing it that way was less work/management and accidental updates/inserts during the outage wouldn't get lost. Going back to doing it in parallel, we have to stop the replica before we start and reset the master after we're done, thus increasing the chance of these lost queries, but so far we've had 0 such incidents during these weekly outages since we started using mysql years ago.

A weekly planned outage is usually a good time to take care of some offline chores. Today I cleaned up lots of unnecessary mounts in a effort to reduce our automounter maps as much as possible (so we don't have such a tangled web which can be quite painful when one server disappears). I also made vader the sole download server, thus freeing bane to be whatever we want - which will be useful to handle certain services temporarily as we go around upgrading the out-of-date operating systems on lots of these machines. I think vader can handle the load alone.

I hear the presentations from the 10th anniversary celebration have all been converted to mpegs. It's a few gigs worth of stuff on a computer down on campus. A flash drive containing all that will appear up here at our lab sometime in the near future. Or it may be hosted on an interim server. We shall see.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

The DB purge is now running, but the scheduler process is still not running. Maybe we're waiting for the splitters to pick up the slack before work is sent out.
____________
What if Fiction was Fact and Fact was Fiction and vice versa?

Matt, just stop slave on the replica and remove alter privileges for the replication account temporarily.
Do the alter command on the slave locally as root, then start slave when finished and it'll carry on from the last position, and don't have to wait for the master to finish.
The alter command from the master will be ignored on the slave (or can be made to), or if it causes replication to stop, then "set global sql_slave_skip_counter=1; start slave;" to skip over it and continue.
Once the slave has read past the alter command, just reset privs again for the repl account on the slave (before any other needed alter commands come along!)
This way, you shouldn't lose any queries at all, nor have to make notes of master pointers.
Entire thing could be done as a script "tuesday_backup.sh" on a cron task!