Microsoft yawns at Google's chillerless data center antidote

Instant failover? We do that too

Common Topics

Chillerless v chillerless

Late last month, Microsoft unveiled its own chillerless data center, 303,000-square-foot facility in Dublin, Ireland. But unlike Google's facility, Microsoft's data center includes a backup for its free-cooling setup: Direct eXpansion (DX) cooling units similar to ordinary air conditioners.

"These are fairly simple units, and we don't run them unless we absolutely have to," Josefsberg says. "It will be for only a very short period of time." According to Josefsberg, there will only be a few hours each year - spread out over a few days - where the Irish air is too hot to cool the server rooms, and that's when the DX units kick in.

Asked whether Microsoft had considered a Google-like setup with no cooling backup, Josefsberg indicated the company had not, going so far as to question the efficiency of such an arrangement. "If you're offloading data like that, essentially that would mean you would have to have a second data center with the same infrastructure somewhere else," he says. "And while each one can be energy efficient, you now need two of them. So the net is actually energy inefficient. You need a lot more infrastructure.

"If you make a reasonable investment in the reliability of the data center, you don't have to failover as much. Otherwise, you need more data centers. They're costly, and they're not good for the environment. We try to strike a balance. We don't want to invest in more data centers than we have to."

No doubt, Google would argue the other way. As Vijay Gill explains, the company's entire back-end philosophy is to create an unified infrastructure that spans all its data center facilities - an infrastructure that behaves as much as possible like a single machine.

In theory, this could lead to even greater levels of efficiency. Some so-called cloud evangelists have trumpeted a "follow the moon" setup, where workloads are constantly shifted to facilities where night has fallen. Night hours mean lower power costs.

But like Google, when a data center malfunctions, Microsoft needs a reliable means of maintaining service. Whether it has back-up cooling or not, there will be times where Microsoft needs to shift workloads out of its shiny new Dublin data center. And Josefsberg says that Redmond can do so on the fly.

As an example, he points to the "fabric controller" build into Windows Azure. "That's essentially what it does," he says. "It measures events and incidents and moves processing for customers to alternate servers. These could be in the same data center, if it's a smaller localized problem, or to another data if there's a problem with the data center itself."

And this can happen on a grand scale. "There was an earthquake in Asia that cut a lot of the undersea fibre optic networks," Josefsberg says. "This is the sort of situation where you have to be smart about failing over your services. The data center itself might be fine, but it might not be connected to the rest of the world. You've got to be able to quickly and automatically detect such situations and re-direct your customers to a failover situation."

And, he adds, Windows Azure is just one example. There's software that performs a similar functions for other Microsoft services.

But there are limits to the automatic nature of Redmond's setup. "In some cases, if we have a potential problem in the data center, the decision isn't completely handled by software. We do have very highly trained staff on site that can determine if we really want to failover everything in the data center.

"Generally, we want the software to make as many of the decisions as possible. But there will be cases where trained engineers and architects will look at it and make the final determination."

Once again, the big difference here is that Redmond thinks in terms of disparate services. There's one setup for Windows Azure, and then there's a separate setup for the next service. Google squeezes all its services into the same unified infrastructure. This is meant to improve performance. But at least in theory, it can also handle failover on a much larger scale.

Of course, there's theory, and then there's practice. Over the past year, two much-discussed Gmail outages occurred when Google was moving workloads between data centers. ®