Friday, June 22, 2007

After a nearly 24 hour restore process that wound up at about 2am this morning (watching paint dry would have been thrilling by comparison and, frankly, I'd have had much more sleep), we're back. We're ensuring that we get today's posts done before completing the catch-up from yesterday.

Things go wrong with technology, and the database failure we experienced was the culmination of a bad day of continued hosting service and connectivity problems - problems that ultimately led to our cooperative load balancing and recovery architecture overwhelming the database. We take daily backups and were able to go to that night's snapshot, so that worked out, but what we didn't know was that it would basically take a day to apply the backup to the database using the tools provided with the software. Clearly that's unacceptable and we're buying a much faster solution that, if and when we need to do this again, will get us going significantly faster.

Unfortunately, we didn't escape 100% unscathed - all the per subscriber open tracking data couldn't be recovered. There are some other issues we found affecting a very small number of publishers (< 20 ) whom I'll be contacting directly. Also, the tracking metrics for today and yesterday are also unhappy at the moment, so don't read too much into them for yesterday and today. The messages are going out as we get all this catching up done, it's just going to take a while.

So we're speeding the backup / restore side of things, and we're also in the market for a new hosting service. Mail me if you want the business.