Exclusive: a behind-the-scenes look at Facebook release engineering

An inside look at how changes are made to one of the world's largest websites.

Facebook is headquartered in Menlo Park, California at a site that used belong to Sun Microsystems. A large sign with Facebook's distinctive "like" symbol—a hand making the thumbs-up gesture—marks the entrance. When I arrived at the campus recently, a small knot of teenagers had congregated, snapping cell phone photos of one another in front of the sign.

Thanks to the film The Social Network, millions of people know the crazy story of Facebook's rise from dorm room project to second largest website in the world. But few know the equally intriguing story about the engine humming beneath the social network's hood: the sophisticated technical infrastructure that delivers an interactive Web experience to hundreds of millions of users every day.

I recently had a unique opportunity to visit Facebook headquarters and see that story in action. Facebook gave me an exclusive behind-the-scenes look at the process it uses to deploy new functionality. I watched first-hand as the company's release engineers rolled out the new "timeline" feature for brand pages.

As I passed through the front entrance of the campus and onto the road that circles the buildings, I saw the name on a street sign: Hacker Way. As founder Mark Zuckerberg explained in an open letter to investors earlier this year when Facebook filed for its initial public offering, he also gave the name "The Hacker Way" to the company's management philosophy and development approach. During my two days at Facebook, I learned about the important role that release engineering has played in making The Hacker Way scale alongside the site's rapid growth in popularity.

The Menlo Park campus is a massive space, densely packed with buildings; it felt more like I was entering a tiny city than a corporate campus. Inside the buildings, tasteful graffiti-like murals and humorous posters decorate the walls. Instead of offices, Facebook developers work mostly in open spaces laid out like bullpens. Workstations are lined up along shared tables, with no barriers between individual workers.

Each building has meeting rooms where employees can have discussions without disturbing other workers. The meeting rooms in each building are named after particular themes. For example, in one building, meeting rooms were named after jokes from Monty Python movies. In another, I saw rooms named after television shows. As I was led through another building, I chuckled at the sight of a room called JavaScript: The Good Parts, obviously named after Doug Crockford's influential book.

I eventually reached the area where the release engineering team is headquartered. Like the rest of the development personnel, release engineering uses an open space at shared tables. But their space has a unique characteristic: a well-stocked bar.

The room initially had a partial wall between two vertical support pillars. When the release engineering team moved in, they converted the space into a bar with a countertop called the "hotfix bar," a reference to critical software patches. They work at a table positioned alongside the bar.

That was where I met Chuck Rossi, the release engineering team's leader. Rossi, whose workstation is conveniently located within arm's reach of the hotfix bar's plentiful supply of booze, is a software industry veteran who previously worked at Google and IBM. I spent a fascinating afternoon with Rossi and his team learning how they roll out Facebook updates—and why it's important that they do so on a daily basis.

Chuck Rossi, the head of Facebook's release engineering team, sitting at the hotfix bar

Facebook's BitTorrent deployment system

The Facebook source code is largely written in the PHP programming language. PHP is conducive to rapid development, but it lacks the performance of lower-level languages and some more modern alternatives. In order to improve the scalability of its PHP-based infrastructure, Facebook developed a special transpiler called HipHop.

HipHop converts PHP into heavily optimized C++ code, which can then be compiled into an efficient native binary. When Facebook unveiled HipHop to the public in 2010 and began distributing it under an open source software license, the company's engineers reported that it reduced average CPU consumption on Facebook by roughly 50 percent.

Because Facebook's entire code base is compiled down to a single binary executable, the company's deployment process is quite different from what you'd normally expect in a PHP environment. Rossi told me that the binary, which represents the entire Facebook application, is approximately 1.5GB in size. When Facebook updates its code and generates a new build, the new binary has to be pushed to all of the company's servers.

Moving a 1.5GB binary blob to countless servers is a non-trivial technical challenge. After exploring several solutions, Facebook came up with the idea of using BitTorrent, the popular peer-to-peer filesharing protocol. BitTorrent is very good at propagating large files over a large number of different servers.

Rossi explained that Facebook created its own custom BitTorrent tracker, which is designed so that individual servers in Facebook's infrastructure will try to obtain slices from other servers that are on the same node or rack, thus reducing total latency.

Rolling out a Facebook update takes an average of 30 minutes—15 minutes to generate the binary executable and another 15 minutes to push the executable to most of Facebook's servers via BitTorrent.

The binary executable is just one part of the Facebook application stack, of course. Many external resources are referenced from Facebook pages, including JavaScript, CSS, and graphical assets. Those files are hosted on geographically distributed content delivery networks (CDNs).

Facebook typically rolls out a minor update on every single business day. Major updates are issued once a week, generally on Tuesday afternoons. The release team is responsible for managing the deployment of those updates and ensuring that they are carried out successfully.

Frequent releases are an important part of Facebook's development philosophy. During the company's earliest days, the developers used rapid iteration and incremental engineering to continuously improve the website. That technical agility played a critical role in Facebook's evolution, allowing it to advance quickly.

When Facebook recruited Rossi to head the release engineering team, he was tasked with finding ways to make sure that the company's rapid development model would scale as the size and complexity of the Facebook website grew. Achieving that goal required some unconventional solutions, such as the BitTorrent deployment system.

During the time that I spent talking with Rossi, I got the impression that his approach to solving Facebook's deployment problems is a balance of pragmatism and precision. He sets a high standard for quality and robustness, but aims for solutions that are flexible enough to accommodate the unexpected.