MENLO PARK, CA—Building 17 of Facebook's headquarters sits on what was once a Sun Microsystems campus known fondly as "Sun Quentin." It now houses a team of Facebook engineers in the company's electrical lab. Everyday, they push forward the company vision of how data center hardware should be built. These engineers constantly bench-test designs for their built-in-house server hardware—essentially putting an end to server hardware as we know it.

Ars recently visited Facebook's campus to get a tour of the server lab from Senior Manager of Hardware Engineering Matt Corddry, leader of Facebook's server hardware design team. What's happening at Facebook's lab isn't just affecting the company's data centers, it's part of Facebook's contribution to the Open Compute Project (OCP), an effort that hopes to bring open-source design to data center server and storage hardware, infrastructure, and management interfaces across the world.

Facebook, Amazon, and Google are all very picky about their server hardware, and these tech giants mostly build it themselves from commodity components. Frank Frankovsky, VP of hardware design and supply chain operations at Facebook, was instrumental in launching the Open Compute Project because he saw the waste in big cloud players reinventing things they could share. Frankovsky felt that bringing the open-source approach Facebook has followed for software to the hardware side could save the company and others millions—both in direct hardware costs and in maintenance and power costs.

Just as the Raspberry Pi system-on-a-board and the Arduino open-source microcontroller have captured the imagination of small-scale hardware hackers, OCP is aimed at making DIY easier, effective, and flexible at a macro scale. What Facebook and Open Compute are doing to data center hardware may not ultimately kill the hardware industry, but it will certainly tilt it on its head. Yes, the open-sourced, commoditized motherboards and other subsystems used by Facebook were originally designed specifically for the "hyper scale" world of data centers like those of Facebook, Rackspace, and other cloud computing providers. But these designs could easily find their way into other do-it-yourself hardware environments or into "vanity free" systems sold to small and large enterprises, much as Linux has.

And open-source commodity hardware could make an impact beyond its original audience quickly because it can be freely adopted by hardware makers, driving down the price of new systems. That's not necessarily good news for Hewlett-Packard, Cisco, and other big players in corporate IT. "Vanity free," open-source designed systems will likely drive innovation fast while disrupting the whole model those companies have been built upon.

“Open” as in “open-minded”

To be clear, Open Compute doesn't go open-source all the way down to the CPU. Even the Raspberry Pi isn't based on open-source hardware because there's no open-designed silicon that is capable enough (and manufacturers aren't willing to produce one in volume for economic reasons). The OCP hardware designs are "open" at a higher level. This way anyone can use standards-based components to create the motherboards, the chassis, the rack-mountings, the racks, and the other components that make up servers.

"We focus on the simplest design possible," Corddry told me. "It's focused really tightly on really high scalability and driving out complexity and any glamor or vanity in the design." That makes it easy to maintain, cheap to buy and build, and simple to adapt to new problems as they emerge. It also makes it easy to build things on top of the designs that will help Facebook and others who buy into the OCP philosophy. The ideas that came out of the OCP Hardware Hackathon at the Facebook campus on June 18 are a primary example.

So far, Facebook and Rackspace are the main adopters of OCP hardware. But that could soon change as the dynamics of open-source hardware start to kick in. As Intel, AMD, and others start to turn out more components built to the OCP specification and contribute more intellectual property to the initiative, some involved with the effort believe it will snowball.

"I think our industry realized probably about a decade ago, when Linux took over for Unix, that open was actually a pretty positive thing for the suppliers as well as the consumers in large-scale computing," Corddry said. "Linux didn't kill the data center industry or the OS industry. I think we're looking at the same pattern and seeing that openness and hardware doesn't mean the death of hardware. It probably means the rebirth of hardware, where we see a greater pace of innovation because we're not always reinventing the wheel."

Deconstructing the server

Enlarge/ Facebook's Senior Manager of Hardware Engineering Matt Corddry shows off some of the "sled" servers designed and built by Facebook.

Sean Gallagher

At the center of Facebook's data center design philosophy is "disaggregation"—the breaking up of what has traditionally constituted a "server" into purpose-specific chunks of hardware interconnected largely by network hardware. It's ironic, in a way, that this is happening on the old Sun campus. In its heyday, Sun advertised with the slogan "The network is the computer." Now, the computer is the network both conceptually and physically.

The design principles behind Facebook's hardware come from direct hands-on experience. Corddry said that all his engineers spend time working as technicians at Facebook data centers "so everyone walks a mile in those shoes and understands what it is to work on this gear at scale."

The approach taken by Facebook and by the Open Compute Project is post-modernist deconstruction for the data center—the disaggregation of the components that usually make up a server into functional components with as little complexity and as much efficiency to them as possible. There are few "servers" per se in Facebook's data center architecture. Instead, there are racks filled with "sleds" of functionality. "That's going to be a pattern you see from us over the next couple of years," Corddry said while showing off a few sled designs in the hardware lab. "A lot of our hardware designs will be focused on one class of problem."

The approach is already being rolled out in Facebook's newest data centers, where racks are filled with systems built from general-purpose compute sleds (motherboards populated with CPUs, memory, and PCI cards for specific tasks), storage sleds (high-density disk arrays), and "memory sleds" (systems with large quantities of RAM and low-power processors designed for handling large in-memory indexes and databases).

"We're not putting these into the network sites yet," Corddry said, referring to the colocation sites Facebook uses to connect to major Internet peering sites. "But we're putting them in all our data center facilities, including Ashburn, Virginia—where it's not a net new build, it's more of a classic data center environment. In fact, we've designed a variant of the original Open racks to go into that kind of facility that has the standard dual power so they can play nice with the colo environment."

The one thing all of these disaggregated modules of hardware have in common is that they're fully self-contained and can be yanked out, repaired, or replaced with minimum effort. Corddry pointed to one of the compute sleds on the lab's workbench. "If there's something to repair on this guy, all you have to do is grab the handle and pull. There are no screws, no need for a screwdriver, and just one cable in front."

Enlarge/ A "Windmill" based Facebook compute sled, in what Corddry calls the "sushi boat" form factor.

Sean Gallagher

Corddry demonstrated this with the "Windmill" compute sled, based on a second-generation open-source motherboard design. "It's a two-socket Intel motherboard in a tube," said Corddry. "We refer to it as the 'sushi boat' form factor." The compute sled has the barest bones of what you’d expect on a server motherboard: two processors, 16 DIMM slots for memory, and a few PCI slots. Facebook uses PCI 10-gigabit Ethernet cards instead of putting Ethernet directly on the motherboard. Corddry said this is largely so the company can get them from multiple suppliers.

There's no power supply on a compute sled; all the power is pulled from the rack for the sake of efficiency. "There's a 12 volt power connection in back," Corddry explained. "We send 12 volt regulated to the board, so we get rid of all the complexity of having power conversion and supply in the system. The principles of efficiency tell you to only convert the power the minimum number of times required; you’re losing two to five percent of the power every time you step it down. So we bring unregulated 480 volt 3-phase straight into our rack and have a power shelf that converts it to 12 volt that goes straight to the motherboard. We convert it once from when it comes in from the utility to when it hits the motherboard. A lot of data centers will convert power three or four times: from 480 to 208, into a UPS, back out of the UPS, into a power distribution unit, and into a server power supply."

The efficiency continues within the design of the cooling fans in each compute sled. "These guys are incredibly efficient," Corddry said. "It has nice big fan blades that turn slowly. It only takes three or four to move air through this guy to keep it cool compared to a traditional 1U server, which can be 80 to 100 watts."

The one set of servers that breaks Facebook's pattern of simplification and disaggregation (slightly) is the company's database servers. "Database servers are usually the hardest thing to do with DIY hardware," Corddry said. "But we wanted to get rid of these expensive, hard-to-service OEM servers from our database tier. It was an interesting kind of quick development project, not a hack-a-thon exactly. But we said, 'Hey, we're still buying this OEM hardware; it's more expensive and harder to service—can we do something better?' Within a matter of months, engineers jumped in and said, 'Let's see if we can get rid of this last piece of OEM equipment in our inventory.'"

Facebook's hardware engineering team came up with a solution using Windmill motherboards, adding a power supply kit "to allow us to run off our high availability dual feed power," Corddry said. "Normally, this design would actually have a single power supply in the back and two motherboards."

While the majority of Facebook's servers run off a single power feed—"the power goes out, the generator kicks in, and we're OK," said Corddry—the database servers need extra power protection to prevent corruption caused by an outage. So the servers that support Facebook's User Database and other big databases need to have redundant power.

I applaud their effort in getting server hardware back to it's barebone basics...everybody who ever fiddled with one of those Blade systems from HP, Cisco and the like (HP C7000, I'm looking at you!) has to wonder: Why the everloving fuck do you have to make things so complicated?

Not a huge fan of Facebook... however I am a huge fan of Open Compute! Loved Franks keynote at Interop Las Vegas in regards to Open Compute and networking. Who better to push design for hardware (and protocols) for data centers than those whose business depends on it?

I came to look at the storage tray...Looks pretty similar to some current Engenio (Netapp) models that IBM and Dell resell except for the part where the tray tilts down when fully extended. That lets it be just a little shorter from front to back.

What makes this work for them, as I see it, is that their demand is large that they can fund a building for hardware research and development and still end up saving money.

One thing I have see smaller initiatives fall to is that when you put together a bunch of OEM parts and create a server or storage unit that you assume additional risks in testing/certifying/maintaining that particular configuration. That can suck a whole lot of time from what is, in many organizations, their 'better paid' labor.

I applaud their effort in getting server hardware back to it's barebone basics...everybody who ever fiddled with one of those Blade systems from HP, Cisco and the like (HP C7000, I'm looking at you!) has to wonder: Why the everloving fuck do you have to make things so complicated?

It's called vertical lock-in. If you're an expert in Cisco, you encourage your company to buy Cisco. If you're an expert in OpenFlow, then you don't care who the vendor is and they have to compete harder for your purchase. That's why Facebook's effort here is a good one for everyone.

The insight is fantastic and the article reads well, but the picture of the rack is disconcerting - in their efforts to simplify things, they still wind up with an end result involving dozens of cables. Thus they've made a great first stride, but they need to unify that part of the platform has well down to 1 cable per rackmount device.

I'm confused about the power part-- are they really not using any form of UPS for these racks?

No UPS for non-database servers, based on what I was told—the whole data center has power protection, but power is stepped down just once, so in-rack there's no UPS. The database servers have redundant power.

I came to look at the storage tray...Looks pretty similar to some current Engenio (Netapp) models that IBM and Dell resell except for the part where the tray tilts down when fully extended. That lets it be just a little shorter from front to back.

What makes this work for them, as I see it, is that their demand is large that they can fund a building for hardware research and development and still end up saving money.

One thing I have see smaller initiatives fall to is that when you put together a bunch of OEM parts and create a server or storage unit that you assume additional risks in testing/certifying/maintaining that particular configuration. That can suck a whole lot of time from what is, in many organizations, their 'better paid' labor.

Yes--so it's an essentially a trust issue...

Similar issues are apparent with FOSS vs. proprietary software. For many organizations, having a (third-party) neck to wring (and to pass the buck of responsibility) is worth the added cost and complexity of going third party.

Regardless of what people think about Facebook and Google, their technical staff are real rockstars. Also says a lot about their leadership's trust in them to allow them to forge these new paths.

I'm confused about the power part-- are they really not using any form of UPS for these racks?

I think outside of the database servers, the idea is that if one data center loses power, the other data centers pick up the slack thanks to the distributed nature of their applications. They can effortlessly shift traffic around to hardware that is currently active.

I'm confused about the power part-- are they really not using any form of UPS for these racks?

I think outside of the database servers, the idea is that if one data center loses power, the other data centers pick up the slack thanks to the distributed nature of their applications. They can effortlessly shift traffic around to hardware that is currently active.

That still leaves a lot of scope for data to get corrupted in transit and having to be fixed later I would think?

Sean's update seems to just confirm that they really don't use any UPS for anything other than the database servers and even that is just a redundant power supply instead of a proper UPS. They must have tested the switch to generator power quite thoroughly to have that much faith in it kicking in fast enough.

So Facebook's engineers are designing this equipment and having an OEM (possibly Dell, as mentioned above, but not necessarily just Dell) design it? I assume they have no way to manufacture it in-house.

Sean Gallagher / Sean is Ars Technica's IT Editor. A former Navy officer, systems administrator, and network systems integrator with 20 years of IT journalism experience, he lives and works in Baltimore, Maryland.