Facebook Hacks Shipping Dock Into World-Class Server Lab

Michael pulls server from rack in Facebook's former shipping and receiving room

Amir Michael in the Facebook server lab that isn't

Amir Michael works for Facebook, so it’s no surprise he’s a hacker. But he doesn’t hack software. He hacks hardware — and the occasional shipping and receiving room.

By the end of 2010, the shipping and receiving dock at Facebook’s Palo Alto headquarters was no longer a shipping and receiving dock. Michael and a few other Facebook engineers moved in when they needed more space for the server lab they’d already built in the storage room next door.

You could tell it used to be a shipping dock because there was a huge scale built into the floor. At one point, the scale was used to weigh packages before they were shipped out, but after Michael and his crew moved in, they used it to weigh the server they’d built from scratch. Twelve months after they first sat down to design a machine for Facebook’s new data center in Prineville, Oregon, one of them picked up the finished product and stood on the scale. Then he put it down and picked up one of those mass-produced servers the rest of the world uses.

Michael’s built-from-scratch machine weighed 10 pounds less.

Like the other giants of the internet world, Facebook supports its online empire with a massive network of data centers and servers. And that costs money. Very large amounts of money. If you feed webpages to hundreds of millions of people, you’re spending mountains of cash not only on the machines themselves, but on the power feeding those machines. At a certain point, you realize you’re spending too much. You need something different from the stuff the rest of the world uses.

Facebook hired Amir Michael in the spring of 2009 to help make itself more efficient. “My manager said: ‘Hey, come in, we have lots of infrastructure we’re building, and we need to make it more innovative, more economical, more energy efficient,’” Michael remembers. “I asked if there was anything specific he was looking for, and he said: ‘No. Why don’t you just come in and figure it out for yourself?’” And that’s what Michael did.

In keeping with Facebook’s now world-famous “hacking” culture, he started with a blank slate and worked at the engineering equivalent of breakneck speed, making the most of whatever he could get his hands on, including Facebook’s IT storage room and the shipping and receiving dock next door. The result was a brand new server that was not only energy-efficient and cost-efficient, but, well, physically efficient.

“Trying to optimize costs, we took out a lot of components you find in a standard server,” Michael says. “That made it easier to service. It made the thermals more efficient because you had less obstructions blocking cool air. And it made it ten pounds lighter. That’s ten pounds less material you’re buying, ten pounds less you have to lift every time you put it into the rack or pull it out, and ten pounds less you have to recycle when you’re done with it.”

Facebook isn’t alone in designing its own servers. Google has done it for years. The difference is that Facebook will invite you into its makeshift server lab and show you how it’s done. It will even give you the goods, including not only the designs for Michael’s servers, but the blueprints for the data center in Prineville, which was built to work in tandem with those servers.

Web giants aren’t the only ones running the sort of massive operations that require more efficient hardware. Financial houses and biomedical outfits and countless other businesses face the same conundrum. Facebook wants to help itself, but it wants to help them too. According to Amir Michael and the rest of Facebook’s hardware braintrust, the two go hand-in-hand.

The Best Experience Is Inexperience

Before joining Facebook, Amir Michael spent more than five and a half years as a hardware engineer at Google. He had debugged motherboards and power supplies, but he had never designed his own server. And that’s why he was suited for building one for Facebook. “My vision wasn’t cluttered,” he says. “I had worked on individual components, but I never sat back and looked at the whole system.”

He started by tinkering with some servers and other equipment in Facebook’s existing data centers — facilities where the company was merely leasing space from other outfits. He and a few other engineers made some improvements, but they soon realized they couldn’t make a big difference unless they redesigned both the servers and the data center from scratch. “We couldn’t just change the data center a little bit and then change the server a little bit,” Michael says. “We did that and we got some gains, but we wanted to go much further.”

So an engineer named Jay Park went to work on the data center, and Michael started on the server. But they also worked together. The idea was to design the two so they fit hand-in-glove.

The average data center wastes an awful lot of energy switching back and forth between AC and DC power and transforming from one voltage to another. But one night, Jay Park says, he dreamed of a new data center that didn’t. When he woke up, there was no paper handy. So he sketched the design on a napkin. Pop song writers aren’t the ones to live the old I-dreamt-it cliche.

Rather than shift power down to the traditional 208 volts using the usual array of massive power distribution units, the design ran 277 volts straight into the server room. “We did this for the same reason the power companies have these big high voltage transmission wires,” Michael says. “The higher the voltages, the less loses you have, the more efficient you are.” What’s more, it did away with the enormous uninterruptible power supply, or UPS, that typically backs up a data center in the event of a power grid outage. Instead, Park and crew put DC batteries right next to each server rack and plugged them straight into the servers. This meant not only back-up power moved over shorter distances, but that it wasn’t switched from DC to AC and back DC again on the trip to the server.

But Park’s dream data center doesn’t work unless you have a server that can accommodate these changes. Michael’s contribution to this experiment in data center symbiosis was a server that included not one but two power connectors. There was one for 277 volt AC power, and there was another that accepted 48 volt DC power straight from the battery beside the rack. “The power supply is smart enough to know when the AC disappears, and it automatically switches over to the battery,” Michael says. “The server doesn’t even know that power was lost.”

But he didn’t stop there. He also redesigned everything from the server chassis to the fans to the motherboard. Michael and his team didn’t just piece together existing parts. They built an entirely new machine.

A Facebook server bench. Which may occasionally double as something else.

Where in the World Is Amir Michael?

How do you build your own server if you’ve never built one before? You read some large manuals describing things like power supplies. Then you find the people who actually build things like power supplies.

When he set out to design his server, Michael cold-called Synnex, an outfit that has spent the last thirty years buying and selling computer hardware across the globe. Synnex is headquartered in Fremont, California, but it has deep ties to original device manufacturers, or ODMs, in Taiwan and China and other parts of the world. Michael asked the company to put him in touch with various power supply and motherboard manufacturers, and the company obliged.

“We were happy too,” says Steve Ichinaga, a Synnex senior vice president and general manager who worked closely with Michael. After all, this was a Facebook. And in the end, Facebook became a Synnex customer. Today, Synnex tests the company’s servers before they’re shipped to the data center in Prineville.

Through Synnex — and via other routes — Michael made contact several device manufacturers, and at least three agreed to help build his new server: Quanta, a motherboard and computer manufacturer based in Taiwan; Delta, another Taiwanese manufacturer that specializing in power supplies; and Power-One, a second power specialist headquartered here in the US. Power-One declined to participate in this story, and Delta and Quanta did not respond to requests for interviews. But according to Michael, all three worked not only with Facebook on the project, but with each other.

“We chose the companies that were the most open, who communicated the best, who would shared their knowledge with us so we could optimize even more,” says Michael. “We shared our power supply design with the motherboard vendor, and we asked them to work together. As a result, things integrated well together. Everyone saw the bigger picture, and it made their engineers more efficient. It made them think the same way we were thinking.”

This is not the way other server designers work, he adds, apparently alluding to Google. “Other companies segregate those things, and that requires a lot more overhead as far as explaining and understanding. It takes the engineer’s focus away from building a very good server. A lot of the smaller details, people just have to figure them out themselves.”

Facebook also worked with Intel and AMD, whose CPUs would be used in the servers. Jason Waxman — the general manager of high-density computing in Intel’s data center group — declines to describe Intel’s role in detail, but he says the chip giant worked “very collaboratively” with Facebook on design of the server.

In working with these partners, Michael’s aim was to build a “vanity-free” server — one that didn’t include all the stuff Facebook wouldn’t need. “We didn’t pay attention to how the servers looked,” he says. “There’s no paint. There’s no buttons on the front. There’s no fancy logos or emblems.” But this no-frills approach was also part of the effort to significantly reduce the cost of cooling the machine.

They eventually settled on a chassis that was significant taller than the average server, so it could accommodate larger fans and larger heat sinks. The larger fans are more efficient at moving air, and with the larger heat sinks in place, they needn’t move as much air. The heat sinks have more surface area, so they’re more efficient at cooling the processor.

At the same, Michael’s team actually rearranged the chips on the motherboard to improve the flow of air. “The idea was to spread things out,” he says, “so that cold air comes in and goes directly to the hot components. There aren’t any components ‘shadowing’ others. More or less, you get really cold air coming straight to the components that need it. By modifying the electrical design, we improved the thermal design.”

Michael sat down to design the new system in January 2010, and the first prototype arrived at his makeshift lab that summer.

The Facebook windtunnel, where servers go for air

Pizza, Beer, Chips, and Motherboards

The pizza and beer came several weeks later. At a data center in Santa Clara, Michael held a “build party.” He served pizza and beer for an army of server technicians from across Facebook and beyond, and in between slices, they worked on the prototypes as Michael and his team watched. The technicians would put them together, and then take them apart. “It was fun, and it was exciting, but it was also a learning experience,” says Steve Ichinaga, who was part of the crowd that day. “It was a great way for people to understand how everything worked.”

That includes Amir Michael and his engineers. This was just one way they kicked the proverbial tires on their prototypes. They installed a thermal chamber in that former shipping and receiving dock, so they could heat the servers up and cool them down — in the extreme. At one point, they heated and cooled a server too quickly and — due to some serious condensation — it came out as a block of ice. In a third room, just off the shipping and receiving dock, they set up wind tunnel for testing how air flowed across the machines. And on the desk next to tunnel, there was an oscilloscope, for actually tracing signals across the motherboard.

Facebook's thermal chamber, aka the server oven

After a good five months of tire kicking and three rounds of prototypes, they settled on a final design. That December, seven racks of machines were shipped to the new data center in Prineville, Oregon. Michael flew up with several other engineers, but they didn’t do much. “We powered on the seven racks, and everything worked. No bugs. And the infant mortality rate — the percent of the severs that died during shipping — was very low, lower than with the traditional servers we used before,” he says. “That was actually a very boring day. The servers arrived. We turned them on. And we clapped. There wasn’t anything for us to do.” It was a testament, he says, to those five months of testing.

He stayed an extra day to make sure nothing went wrong. But nothing did. So he flew home.

Version 2.0 comes to life

Take My Server. Please

One of the reasons engineers like working for Facebook, Michael says, is that they get to actually talk about it with people who don’t work for Facebook. With its data center and server work, the company takes this ethos to extremes. In April of last year, three months after Michael turned on those servers in Prineville, the company released its designs as part of what it calls the Open Compute Project. Anyone that wants them, can have them.

And anyone can have the new ones too. A year after releasing version 1.0 of their Open Compute server designs, Michael and crew are on the verge of releasing version 2.0. And according to Synnex — which has created a new division, Hyve, to offer Open Compute servers and other custom-built machines to the rest of the world — several outfits have already placed orders for the systems, including one or two big internet names.

In sharing the designs, Facebook hopes to drive down the price of its machines, but it also wants to encourage others to help improve the designs. The company knows all too well that building a server is a collaborative process.

In building Facebook’s machines, Amir Michael worked with Power-One engineers in Italy and Delta engineers in Germany as well as any number of engineers in Taiwan and China. For version 2.0, he teamed up with a second Taiwanese motherboard manufacturer known as Wistron. As we stand and talk in his server lab just before the Christmas Holiday, a Facebook lab technician, Peter Ha, and two unnamed men from outside the company walks in to scrutinize the new designs — even as Michael is working to move the lab to Facebook’s new headquarters at the old Sun Microsystems campus in Menlo Park, California.

They don’t appear to be speaking English. As it happens, we’ve just finished asking Michael if he and his engineers had to overcome a language in working with other engineers from across the globe. “A lot of our engineers are actually fluent in Mandarin,” he says. “If there was ever any difficulty on the conference call, the conversation would switch to Mandarin.”

Like any self professed Facebook hacker, Amir Michael and his team use whatever’s handy to accomplish the task at hand. And that includes the rest of the world.