Facebook friends open source hardware for data centers

The Open Computing Project Foundation aims to apply the Apache Foundation …

The term “open source server” just took on a whole new meaning. This morning at an event in New York, Facebook director of hardware design and supply chain Frank Frankovsky announced the creation of a foundation to guide the Open Compute Project (OCP)—an effort initiated by Facebook engineers to bring the benefits of an open-source community to the problems faced in building efficient “Web-scale” data centers. Facebook, Intel, AMD, and Asus also have contributed intellectual property to the project, including motherboard and blade server specifications.

The OCP was launched by engineers at Facebook as a result of their experience in trying to build a highly efficient data center in Prineville, Oregon. The Prineville data center is the most efficient in the world in terms of power consumption, using 38 percent less energy than Facebook’s existing data centers and costing 24 percent less. With a power usage effectiveness (PUE) rating of 1.07, only seven percent of the power brought into the facility is used in the data center’s overhead and cooling. But getting there required Facebook’s engineers to custom-design servers, power supplies, battery backup systems, and server racks to accommodate a simplified power distribution system—using 480 volt distribution to reduce loss, rather than stepping it down—and minimize cooling requirements.

Facebook is hardly the only Web company that has had to design its own hardware. Google, Amazon, and others all have had to follow a similar path, according to Arista Network chief development officer and Sun cofounder Andy Bechtolsheim, who spoke at today’s event in New York. “Literally all the large-scale data centers in the world are built on off-the-shelf mother boards,” he said. “Because there was no standards, everyone had to do their own thing.” It would be better, he said, if there was a standard everyone could use for building out the sorts of systems used in Web data centers and cloud computing environments.

That’s the reasoning that led Facebook to launch the OCP in April, and publish the specs and designs of the hardware developed in the Prineville effort under the banner of the OCP; in an effort to kickstart more collaboration across the space in the model of open source software development. Now, Facebook has put the OCP under the auspices of the Open Compute Project Foundation, a nonprofit organization modeled after the Apache foundation, with the goal of getting rid of what Bechtolsheim, an Open Compute Foundation board member, calls “gratuitous differentiation” in hardware.

The other board members of the foundation include Goldman Sachs managing director Don Duet, Frankovsky, Rackspace chief operating officer Mark Roenigk, and Intel data center group general manager Jason Waxman. And Frankovsky said that a set of bylaws for the OCP Foundation have been established to govern how organizations submit contributions. He also introduced some of the other members of the foundation, which include Amazon, Asus, Dell (which is contributing to management standards), AMD, Cloudera, and Red Hat; Red Hat’s role will include certifying OCP hardware for Red Hat Enterprise Linux. Digital Realty, the data center hosting company, is also onboard.

Frankovsky also said that the foundation has formed a “strategic alignment” with the Open Data Center Alliance, a customer consortium made up of corporate IT organizations’ date center managers, and with a number of universities. “The University of North Carolina is looking at adding OpenCompute to their curricula,” he said, “and we’re also working with Georgia Tech.”

In addition to the contributions already made by Facebook, there have also been contributions made by Dell, Asus, Intel and AMD. Intel and Facebook worked together to submit “Wildcat” and “Windmill,” according to Intel’s Waxman—two Intel-designed motherboards. Intel’s Waxman said that the OCP Foundation would help “democratize” the process of how the industry optimizes hardware platforms.

“The whole industry has a proud tradition of how standards have accelerated innovation,” Bechtolsheim said. “What has been missing is a standard at the system level.” He cited the development of blade servers in particular, which started to address the issues that big data centers face in ease of management and hardware swapping, “but every company built their own blade chassis. Nothing is more frustrating to a customer than having a new box come in that has something different in it that doesn’t work with a particular application.”

James Hamilton, vice president and distinguished engineer at Amazon and a member of the Amazon Web Services team, said during the event’s kickoff that an open source approach to driving the efficiency of hardware in the data center is critical going forward as companies like Amazon scale up. "Every day we add enough capacity to support Amazon as a 2.7 billion dollar business—we bring in that much more capacity every day," he said, and saving money on that infrastructure is critical to staying profitable.

Hamilton said that the biggest costs associated with growing capacity isn’t data center space or power, but the hardware itself. And blade servers don’t solve that problem, because they cost more. “The floor space is four percent of the cost,’” Hamilton said, but the servers are 57 percent, and “there's no way you want to pay more on servers just to save on floor space.” He said the only innovation that vendors delivered with blade servers was “turning the servers 90 degrees.”

That’s a problem that Facebook was trying to address with one of its contributions to the OCP: the “Open Rack” specification, which Frankovsky called “blade servers done in open source.” The full 19-inch rack is a server blade chassis, in effect, with top-of-rack shared storage, and power distribution and battery backup integrated into the rack itself. By open-sourcing the specification and design, Frankovsky says, the hope is that systems vendors will “use the full chassis of the rack to innovate within that boundary.”

Scaling up in Europe

Facebook is applying the designs and standards developed in the Prineville effort to the other data centers it now has in the pipeline, according to Frankovsky, including the company’s first European data center in Lulea, Sweden, a town 60 miles south of the Arctic Circle.

That site, which will go live in 2014, will be three times the size of the Prineville facility, with three 300,000 square-foot server warehouses and two transformer buildings, will draw 120 megawatts of electricity exclusively from a nearby hydroelectric power station that generates twice as much electric power as the Hoover Dam. “The national [power] grid is extremely reliable in Sweden,” Frankovsky said, “so we were able to eliminate 70 percent of the generators onsite from the design.”

Facebook hasn’t discussed the price of the new data center, but previous reports in local media put the construction costs at around $760 million, with a contribution of up to $16 million from the Swedish government.

So did they even hint as to when somebody would actually be selling real systems based on these standards?

Asus is already selling the motherboards, at least to Facebook. There's been no announcement from anyone on ship dates. However. a number of hardware manufacturers, including Dell's data center systems group, are deeply involved in this, so I think it's safe to assume they'll be selling hardware based on the standards.

It would be cool to create a data center from scratch. For instance instead of the standard horizontal rack, switch to a vertical rack with open spacing from the bottom to the top. Considering the heat being generated and captured at the high point of the data center the velocity of air moving up through the rack may be able to cool the systems with little or no fans. If you were building this in an area were it was cold enough, you might be able to use the building as a giant heat sink to cool the air and cycle it back the lowest part of the building, basically making the entire data center into a giant convection cycle.

Hmm 480v mains, that means every computer has to step it down. Computer power supplies are notoriously inefficient. Would it not make more sense to go with -48V DC mains, and use more efficient power supplies in the systems? Or even 12V mains, and no power supplies in the systems?

I've seen this done, but sideways. The only problem is by the time you get to the top of the stack you can be 50deg higher then the bottom. That means you have to start pretty cold.

My experience were devices that had fans the blew from left to right. By the time you got to the end of a row, the machines were melting.

EchtoGammut wrote:

It would be cool to create a data center from scratch. For instance instead of the standard horizontal rack, switch to a vertical rack with open spacing from the bottom to the top. Considering the heat being generated and captured at the high point of the data center the velocity of air moving up through the rack may be able to cool the systems with little or no fans. If you were building this in an area were it was cold enough, you might be able to use the building as a giant heat sink to cool the air and cycle it back the lowest part of the building, basically making the entire data center into a giant convection cycle.

Hmm 480v mains, that means every computer has to step it down. Computer power supplies are notoriously inefficient. Would it not make more sense to go with -48V DC mains, and use more efficient power supplies in the systems? Or even 12V mains, and no power supplies in the systems?

They're stepping down at the rack level. There was some discussion of high-voltage DC distribution being something the OCP could drive, however. Hamilton brought it up.

I've seen this done, but sideways. The only problem is by the time you get to the top of the stack you can be 50deg higher then the bottom. That means you have to start pretty cold.

My experience were devices that had fans the blew from left to right. By the time you got to the end of a row, the machines were melting.

I have seen that method and I never understood what the designers were thinking. The current Facebook data center pushes the heat from the front to the back via a positive pressure environment and traps it in alleys between the back of each row. There is still a fairly high variance from the top to bottom, but it is supposed to be within tolerances. They also use a 1.5U standard to allow for enough space between the systems for adequate cooling. I would probably use a similar spacing if I were to engineer my vertical rack setup. You do lose space, but you it would probably save enough in energy to justify the loss in density.

Hmm 480v mains, that means every computer has to step it down. Computer power supplies are notoriously inefficient. Would it not make more sense to go with -48V DC mains, and use more efficient power supplies in the systems? Or even 12V mains, and no power supplies in the systems?

Hmm 480v mains, that means every computer has to step it down. Computer power supplies are notoriously inefficient. Would it not make more sense to go with -48V DC mains, and use more efficient power supplies in the systems? Or even 12V mains, and no power supplies in the systems?

480v is an odd number for a DC mains, it is usually high 3xx for various reasons. Anyhow, these days you can get highly integrated DC/DC converters that go from that high voltage straight to 12v to power the CPU DC/DC with upwards of 96% efficiency and very high reliability, eg Vicor's High Voltage BCM line.

If you go with 48v distribution you still have to convert to 12v or lower, and it is still 96% or so. Much better to go with a very high voltage to avoid IR losses in the distribution and minimized the required copper.

The danger of course, is if you lick the 480VDC you die. 48v just tingles.

They also use a 1.5U standard to allow for enough space between the systems for adequate cooling. I would probably use a similar spacing if I were to engineer my vertical rack setup. You do lose space, but you it would probably save enough in energy to justify the loss in density.

I don't see 1.5U specified, and I've never seen that done, but I'll take your word for it.

Some designs object to having any dead space between components. Instead, the idea is to make sure that no cold air is wasted; all air goes into the front side intake of the actual equipment, so that it is useful for cooling internal components. The logic goes, Cooling the top panel of the chassis of an individual racked component is far less efficient than pushing that same air across the internal heatsinks underneath that top panel. Therefore, -not- having any dead space above a racked component is ideal.

Glancing at the rack specifications at the open compute site, I think they're thinking something similar, although the chassis are more like trays really, which looks like it would be even better. No useless metal enclosures between air and heatsinks.

Edit: Nevermind — I think I misunderstood what you were saying. The rack -pitch- isn't 1.5U, but the trays are "a little taller than 1.5U". Forget I babbled anything.

In some ways it makes me sad - my children will only have iPads and the only way way to get close to a truly programable CPUs will be to work for a huge corporation owning huge data centres - exactly the kind of future that nobody wants but is very likely to happen.

In some ways it makes me sad - my children will only have iPads and the only way way to get close to a truly programable CPUs will be to work for a huge corporation owning huge data centres - exactly the kind of future that nobody wants but is very likely to happen.

They probably will not need anything else either. For most people a desktop makes no sense, and as ARM gets faster and x86 smaller that will be the future. And as said, for most people it will not matter.

For power users and in a production environment the powerful workstation will still have a place for some time.

In some ways it makes me sad - my children will only have iPads and the only way way to get close to a truly programable CPUs will be to work for a huge corporation owning huge data centres - exactly the kind of future that nobody wants but is very likely to happen.

480V is a common 3-phase distribution voltage and makes perfect sense to minimize I2R losses in a high power environment. Single phase distribution to each power converter is then 277V. Note that the converter efficiency target is >95% between 50 and 90% of max load. That's pretty impressive.

They're stepping down at the rack level. There was some discussion of high-voltage DC distribution being something the OCP could drive, however. Hamilton brought it up.

I'm sure they've got a cost/benefit curve for what's doable today. With the motherboard operating on 12VDC, though, the design makes swapping out the power infrastructure relatively trivial in the future. A datacenter with 12V infrastructure could just substitute a plug for the PSU.

I'm interested to know why OCP decided to go with shared battery infrastructure rather than individual batteries on each server, a la Google. I'm sure the efficiency argument could go either way based on design, but it'd seem that putting the battery on the server chassis might be a way to tie together two items with roughly comparable lifespans.

Facebook *claims* to have the most efficient data center but their number is highly suspect. Their PUE was only measured over a very short period of time in winter when little cooling was needed. It would be a more valid claim if they measured PUE over full quarter-year periods.

I'm sure they've got a cost/benefit curve for what's doable today. With the motherboard operating on 12VDC, though, the design makes swapping out the power infrastructure relatively trivial in the future. A datacenter with 12V infrastructure could just substitute a plug for the PSU.

I'm interested to know why OCP decided to go with shared battery infrastructure rather than individual batteries on each server, a la Google. I'm sure the efficiency argument could go either way based on design, but it'd seem that putting the battery on the server chassis might be a way to tie together two items with roughly comparable lifespans.

The goal here is not just modularity and ease of upgrades, but also efficiency (and cost. Mostly cost). Distribution with 12VDC is quite inefficient over any significant distance, without investing in a couple of copper mines - and you'd still have to step down from whatever power is coming into the building.

It makes much more sense to use high voltage for power distribution throughout the building with the final step down to 12 VDC much closer to the CPU. I could definitely see facilities using high voltage DC mains for their servers if culus's statement about DC/DC transformers is correct. Although "upwards of 96%" and ">95%" (for traditional AC to DC PSU's) are not really that different, so the benefits might be marginal.

Batteries on the rack vs centralized is an interesting debate. Batteries and computers are pretty wildly different technologies, and there's no real reason to expect the lifetime of a UPS battery to be exactly identical to the upgrade pattern of individual servers. It makes the most sense to keep the batteries close to wherever the DC power is generated - that minimizes the number of things that need an explicit UPS, while not wasting energy with a DC -> AC -> DC conversion process. If Google used off-the-shelf PSUs, it would have made sense to keep the battery with the server. Facebook is skipping that and doing rack-scale power conversion, so it should also do rack scale batteries.