Behind the Scenes of the Bloglines Datacenter Move (Part 4)

With two weeks to go before the move, we started having daily status meetings with all the people involved with the move: people from site ops, net ops and the entire Bloglines team. These only ran about 10-15 minutes each, but were invaluable in getting issues taken care of quickly. We were still working through issues with blog article migration, but we thought we could still make the December 16th date. We came up with estimates on how long it would take to transfer the other databases to the new co-lo, and arrived at a total of 4 hours of downtime, with an additional 2 hours of the crawlers being turned off ahead of time.

Unfortunately, at a point after that, it became clear that we wouldn’t hit the December 16th date; we’d most likely be ready two days later on Sunday December 18th. Ask Jeeves has a winter shutdown, which this year started on December 23rd and runs through January 2. We had a couple of options at that point: do the move on Sunday or one of the weekdays before December 23, or push the move out to the new year, most likely to January 6th. In my experience, user-based Internet services have two slow periods during the year: July/August and the last half of December. Because of this, and because moving to the new datacenter would greatly improve the user experience, we decided to push for the move to happen in December. We targeted Monday, December 19th to give us an extra day past when we estimated we’d be ready. And we decided to start the process at 2pm, which would hopefully let us finish without extending too far in the evening, while still avoiding the peak time of traffic to the site.

On Sunday, December 18th we put up a blog post announcing the upcoming downtime, and also inserted a link at the top of every page on the site alerting users to the downtime. One of the last things we did before the move was to have Ben, our UI/graphics guru, modify the Bloglines Plumber, giving him a pirate makeover. It was going to be a special downtime, and we wanted to make sure he looked good (we’re fans of both Talk Like a Pirate Day and the Flying Spaghetti Monster).

At this point, I want to get a little technical. Don’t worry, it’ll only last a paragraph and you won’t be quizzed. When planning a move like this, where a site will end up with a new IP address, you need to take some DNS issues into consideration. DNS is like the white pages of the Internet. It maps domains like http://www.bloglines.com to IP addresses, which are the actual machines. Each DNS record has a Time To Live, or how long the record is valid for (and how long you can cache the record before asking for it again). DNS records are cached all over the Internet, and many of these caches are broken. When planning this move, we did a couple of things:

A week before the move, we turned the TTLs down to 5 minutes.

Before the move itself, we put the Bloglines Plumber downpage up at the new datacenter.

To take down the site, we configured the webservers at the old datacenter to proxy to the new datacenter.

We then changed the DNS records to point to the new datacenter.

By proxying, I mean that the webservers would just act like a go-between, taking an incoming request, forwarding it to the webservers at the new datacenter, and returning the response. When we were ready to bring the site back up at the new datacenter, we removed the downpage at the new datacenter, but kept the webservers running at the old datacenter, which, to this day, still proxy requests to the new datacenter. That way, even if a client tries to connect to the old datacenter because they have incorrect DNS records, they’ll still get the site running at the new datacenter.