Google boasts of melting data center antidote

Structure 09 Google is developing some sort of back-end technology that automatically - and nearly instantly - redistributes live compute loads when a data center is in danger of overheating. Or maybe this is just talk. Google prefers to at least maintain the illusion of data-center nirvana.

During a panel discussion last week at Structure 09, an annual mini-conference dedicated to all-things cloudy, Google senior manager of engineering and architecture Vijay Gill had a bit of fun with Google's well-known penchant for corporate secrecy - almost as much fun as he had at the expense of Microsoft.

When Facebook vice president of operations Jonathan Heiliger asked Gill to describe Google's now famous back-end infrastructure, Gill couldn't help but respond with a deadpan "Unfortunately, I can’t talk about that at all."

Gill promptly admitted this was a joke, and his audience chuckled. "Everybody at Google loves to say that," grumbled Gill's Facebook counterpart. But much the same words would resurface later in the conversation.

After Gill turned his nose up at Microsoft's back-end, Heiliger asked the sort of question that so often turns up during these conference panel discussions: "If you could wave your magic wand," Heiliger asked, "to create some bit of kit or new technology [for infrastructure] that we don't have today, what would it be?"

"What we are building here - and of course this is the title of the [paper] - is warehouse-sized compute platforms," Gill said. "You have to have integration with everything right from the chillers down all the way to the CPU. Sometimes, there's a temperature excursion, and you might want to do a quick load-shedding - a quick load-shedding to prevent a temperature excursion because, hey, you have a data center with no chillers. You want to move some load off. You want to cut some CPUs and some of the processes in RAM."

Apparently, Google is working to detect temperature spikes and automatically respond to them in near real-time. Gill even seems to be saying that the company hopes to instantly redistribute workloads between data centers.

"How do you manage the system and optimize it on a global-level? That is the interesting part," Gill continued. "What we’ve got here [with Google] is massive - like hundreds of thousands of variable linear programming problems that need to run in quasi-real-time. When the temperature starts to excurse in a data center, you don’t have the luxury to sitting around for a half an hour...You have on the order of seconds."

Heiliger asked Gill if this was a technology Google is using today. "I could not possibly comment on that," Gill replied, drawing more laughter from his audience. Another panel member - LinkedIn vp of technical operations Lloyd Taylor, who was once director of global operations for Google - raised his hand as if to answer the question. But naturally, he didn't.

It's no secret that Google's distributed file system, GFS, and its distributed compute platform, MapReduce, are designed to deal with frequent hardware failures. When one machine goes down, its tasks are quickly picked up by another. But Gill indicated that Google is working to automatically respond not just to outright failures but to temperature spikes. He even seemed to be saying that the company is developing a way to sidestep the imminent breakdown of an entire data center ("a data center with no chillers") and to do so on "a global-level."

If Google has indeed developed a technology capable of such responses, it hasn't perfected the thing. If you've used Gmail in the past few months, chances are you're aware of this.

Google's strict code of corporate secrecy is often balanced by a pathological need to tell the world how great it is. And Gill showed a bit of both last week. It was entertaining - if not enlightening. ®