An inside look at how changes are made to one of the world's largest websites.

Facebook is headquartered in Menlo Park, California at a site that used belong to Sun Microsystems. A large sign with Facebook's distinctive "like" symbol—a hand making the thumbs-up gesture—marks the entrance. When I arrived at the campus recently, a small knot of teenagers had congregated, snapping cell phone photos of one another in front of the sign.

Thanks to the film The Social Network, millions of people know the crazy story of Facebook's rise from dorm room project to second largest website in the world. But few know the equally intriguing story about the engine humming beneath the social network's hood: the sophisticated technical infrastructure that delivers an interactive Web experience to hundreds of millions of users every day.

I recently had a unique opportunity to visit Facebook headquarters and see that story in action. Facebook gave me an exclusive behind-the-scenes look at the process it uses to deploy new functionality. I watched first-hand as the company's release engineers rolled out the new "timeline" feature for brand pages.

As I passed through the front entrance of the campus and onto the road that circles the buildings, I saw the name on a street sign: Hacker Way. As founder Mark Zuckerberg explained in an open letter to investors earlier this year when Facebook filed for its initial public offering, he also gave the name "The Hacker Way" to the company's management philosophy and development approach. During my two days at Facebook, I learned about the important role that release engineering has played in making The Hacker Way scale alongside the site's rapid growth in popularity.

The Menlo Park campus is a massive space, densely packed with buildings; it felt more like I was entering a tiny city than a corporate campus. Inside the buildings, tasteful graffiti-like murals and humorous posters decorate the walls. Instead of offices, Facebook developers work mostly in open spaces laid out like bullpens. Workstations are lined up along shared tables, with no barriers between individual workers.

Each building has meeting rooms where employees can have discussions without disturbing other workers. The meeting rooms in each building are named after particular themes. For example, in one building, meeting rooms were named after jokes from Monty Python movies. In another, I saw rooms named after television shows. As I was led through another building, I chuckled at the sight of a room called JavaScript: The Good Parts, obviously named after Doug Crockford's influential book.

I eventually reached the area where the release engineering team is headquartered. Like the rest of the development personnel, release engineering uses an open space at shared tables. But their space has a unique characteristic: a well-stocked bar.

The room initially had a partial wall between two vertical support pillars. When the release engineering team moved in, they converted the space into a bar with a countertop called the "hotfix bar," a reference to critical software patches. They work at a table positioned alongside the bar.

That was where I met Chuck Rossi, the release engineering team's leader. Rossi, whose workstation is conveniently located within arm's reach of the hotfix bar's plentiful supply of booze, is a software industry veteran who previously worked at Google and IBM. I spent a fascinating afternoon with Rossi and his team learning how they roll out Facebook updates—and why it's important that they do so on a daily basis.

Chuck Rossi, the head of Facebook's release engineering team, sitting at the hotfix bar

Facebook's BitTorrent deployment system

The Facebook source code is largely written in the PHP programming language. PHP is conducive to rapid development, but it lacks the performance of lower-level languages and some more modern alternatives. In order to improve the scalability of its PHP-based infrastructure, Facebook developed a special transpiler called HipHop.

HipHop converts PHP into heavily optimized C++ code, which can then be compiled into an efficient native binary. When Facebook unveiled HipHop to the public in 2010 and began distributing it under an open source software license, the company's engineers reported that it reduced average CPU consumption on Facebook by roughly 50 percent.

Because Facebook's entire code base is compiled down to a single binary executable, the company's deployment process is quite different from what you'd normally expect in a PHP environment. Rossi told me that the binary, which represents the entire Facebook application, is approximately 1.5GB in size. When Facebook updates its code and generates a new build, the new binary has to be pushed to all of the company's servers.

Moving a 1.5GB binary blob to countless servers is a non-trivial technical challenge. After exploring several solutions, Facebook came up with the idea of using BitTorrent, the popular peer-to-peer filesharing protocol. BitTorrent is very good at propagating large files over a large number of different servers.

Rossi explained that Facebook created its own custom BitTorrent tracker, which is designed so that individual servers in Facebook's infrastructure will try to obtain slices from other servers that are on the same node or rack, thus reducing total latency.

Rolling out a Facebook update takes an average of 30 minutes—15 minutes to generate the binary executable and another 15 minutes to push the executable to most of Facebook's servers via BitTorrent.

The binary executable is just one part of the Facebook application stack, of course. Many external resources are referenced from Facebook pages, including JavaScript, CSS, and graphical assets. Those files are hosted on geographically distributed content delivery networks (CDNs).

Facebook typically rolls out a minor update on every single business day. Major updates are issued once a week, generally on Tuesday afternoons. The release team is responsible for managing the deployment of those updates and ensuring that they are carried out successfully.

Frequent releases are an important part of Facebook's development philosophy. During the company's earliest days, the developers used rapid iteration and incremental engineering to continuously improve the website. That technical agility played a critical role in Facebook's evolution, allowing it to advance quickly.

When Facebook recruited Rossi to head the release engineering team, he was tasked with finding ways to make sure that the company's rapid development model would scale as the size and complexity of the Facebook website grew. Achieving that goal required some unconventional solutions, such as the BitTorrent deployment system.

During the time that I spent talking with Rossi, I got the impression that his approach to solving Facebook's deployment problems is a balance of pragmatism and precision. He sets a high standard for quality and robustness, but aims for solutions that are flexible enough to accommodate the unexpected.

This is what happens when you use the wrong tools from the outset. As interesting as this is, it's tantamount to the fantastic genius and rigamarole I see in my clients who have managed to mercilessly beat Excel into acting like a database.

Love this article. As a non-developer, but someone with curiosity how immense-scale systems like Facebook work, this was great. More like this.

I worked for many years (directly, and still do indirectly) in the publishing industry, on putting massive projects to press. It's always interesting to me that while the tools and the nature of the work changes dramatically, foundations in process and reward, balancing fundamentals of a production cycle against what creates the least disruption for a client/customer, never does.

This is what happens when you use the wrong tools from the outset. As interesting as this is, it's tantamount to the fantastic genius and rigamarole I see in my clients who have managed to mercilessly beat Excel into acting like a database.

Wrong tools?

I call 'scoreboard'.

The right tool is whatever gets the job done. Did you miss the part where the code runs with 50% less CPU utilization because they do it this way?

One of the major ongoing development efforts at Facebook is a project to replace the HipHop transpiler. Facebook's developers are creating their own bytecode format and custom runtime environment, called the HipHop virtual machine, to power the next generation of the Facebook platform. With this project finished, the company will be able to compile its PHP source into bytecode that will be executed by the virtual machine.

When people wonder why Facebook was so successful this is one of their pillars. Its easy to use, it connects you socially easily, and its always up, solid, and reliable. When you look at past attempts at social networking for the masses they were missing at least one of those core ingredients.

"HipHop virtual machine, to power the next generation of the Facebook platform. With this project finished, the company will be able to compile its PHP source into bytecode that will be executed by the virtual machine."

Thats HUGE. Would put PHP up there with Java. I wonder how many platforms the virtual machine will run on ?

This is what happens when you use the wrong tools from the outset. As interesting as this is, it's tantamount to the fantastic genius and rigamarole I see in my clients who have managed to mercilessly beat Excel into acting like a database.

Wrong tools?

I call 'scoreboard'.

The right tool is whatever gets the job done. Did you miss the part where the code runs with 50% less CPU utilization because they do it this way?

I think his gripe is over using PHP for a site of this size. PHP code doesn't run efficiently.

But who cares? CPU time is cheap, developer time is expensive. Anything that lets you substitute the former for the latter is a net economic gain.

The right tool is whatever gets the job done. Did you miss the part where the code runs with 50% less CPU utilization because they do it this way?

No, I saw the part where I read they had to develop a special PHP compiler that makes a seriously massive binary that has to be rolling deployed to all their servers using a customized BT-based workflow. A process which is completely at odds with the purpose of using PHP to begin with.

The development of HipHop is explicitly Facebook fighting an epic battle with their toolset.

Our updates consist of keeping track of each file modified manually, then uploading them w/ FileZilla. We don't even have a version control system because "checking in and out is too much of a hassle"... :/

Anyone have any advice or suggested reading for getting our setup a little more versatile?

The right tool is whatever gets the job done. Did you miss the part where the code runs with 50% less CPU utilization because they do it this way?

No, I saw the part where I read they had to develop a special PHP compiler that makes a seriously massive binary that has to be rolling deployed to all their servers using a customized BT-based workflow. A process which is completely at odds with the purpose of using PHP to begin with.

The development of HipHop is explicitly Facebook fighting an epic battle with their toolset.

You're missing my point. They've been extremely successful doing it this way. One of the most successful companies ever. It kinda seems to be working out OK for them, you know? But I'm sure if you designed their toolset they'd be worth $120 billion instead of "only" $100 billion.

Our updates consist of keeping track of each file modified manually, then uploading them w/ FileZilla. We don't even have a version control system because "checking in and out is too much of a hassle"... :/

Anyone have any advice or suggested reading for getting our setup a little more versatile?

your current method sounds like more of a hassle than any of the version control systems.

Our updates consist of keeping track of each file modified manually, then uploading them w/ FileZilla. We don't even have a version control system because "checking in and out is too much of a hassle"... :/

Anyone have any advice or suggested reading for getting our setup a little more versatile?

your current method sounds like more of a hassle than any of the version control systems.

You have my vote, however I cannot convince the rest of the department.

I think the biggest reasons we use it are1) Not wanting to use command line to checkin/out files2) We don't run apache servers on our own PCs - instead we just have 1 development site we all share3) "we've always done it this way" syndrome4) What would we even do to push the latest version out to our live servers? rsync the 'current' codebase or something?

Are there any source control methods that play well w/ 1 and 2? If it was GUI-based checkin/out that might go over easier.

You're missing my point. They've been extremely successful doing it this way. One of the most successful companies ever. It kinda seems to be working out OK for them, you know? But I'm sure if you designed their toolset they'd be worth $120 billion instead of "only" $100 billion.

Just like my clients who have spent years using Excel as an ad-hoc database. I mean it's worked so far. Yeah, the book-keeping and inventory staff count is 5 times as big as it needs to be to manage it, but hey... it's working, so it must be awesome.

Like I said. I find it impressive what they've managed to do with PHP. The fact remains they've pissed a significant number of man-hours and resources into solving problems that are unique to their choice of using PHP in the first place.

I think the biggest reasons we use it are1) Not wanting to use command line to checkin/out files2) We don't run apache servers on our own PCs - instead we just have 1 development site we all share3) "we've always done it this way" syndrome4) What would we even do to push the latest version out to our live servers? rsync the 'current' codebase or something?

Are there any source control methods that play well w/ 1 and 2? If it was GUI-based checkin/out that might go over easier.

Mercurial is a good answer, it has guis for most of the main places, and you can make hooks for it, SVN also can do this (and I assume git, but Windows has no good git clients).

Basically Mercurial works like this. You check out from a server (whichever one you like the most, stable/beta/dev), you can commit to any server, and that can trigger hooks on that client. So it has setup time, but you save a lot in the long run.

Our updates consist of keeping track of each file modified manually, then uploading them w/ FileZilla. We don't even have a version control system because "checking in and out is too much of a hassle"... :/

Anyone have any advice or suggested reading for getting our setup a little more versatile?

your current method sounds like more of a hassle than any of the version control systems.

You have my vote, however I cannot convince the rest of the department.

I think the biggest reasons we use it are1) Not wanting to use command line to checkin/out files2) We don't run apache servers on our own PCs - instead we just have 1 development site we all share3) "we've always done it this way" syndrome4) What would we even do to push the latest version out to our live servers? rsync the 'current' codebase or something?

Are there any source control methods that play well w/ 1 and 2? If it was GUI-based checkin/out that might go over easier.

What OS are you using for development? We have Windows with subversion for version control. Visual SVN is what we use for setting up the server (very simple). Then we use tortoise SVN to check in, which is GUI based and integrated into windows file explorer.

Deployments are a mess where I work though. Engineers build and check in then the deployment team has to manually update software on our customers servers. The different IT enviroments for each customer make it nearly impossible to simplify deployments.

This is what happens when you use the wrong tools from the outset. As interesting as this is, it's tantamount to the fantastic genius and rigamarole I see in my clients who have managed to mercilessly beat Excel into acting like a database.

Wrong tools?

I call 'scoreboard'.

The right tool is whatever gets the job done. Did you miss the part where the code runs with 50% less CPU utilization because they do it this way?

I think his gripe is over using PHP for a site of this size. PHP code doesn't run efficiently.

But who cares? CPU time is cheap, developer time is expensive. Anything that lets you substitute the former for the latter is a net economic gain.

When you're the size of Facebook, CPU time is not cheap. More CPU time means more servers, which means bigger datacenters, which means higher cooling costs, which means higher power bills, among other things. Reducing CPU utilization as much as you can means all of these come down.

Our updates consist of keeping track of each file modified manually, then uploading them w/ FileZilla. We don't even have a version control system because "checking in and out is too much of a hassle"... :/

Anyone have any advice or suggested reading for getting our setup a little more versatile?

Yes. Version control using Mercurial or Git.

Or finding a new job. Places like that, which refuse to use the tools provided them, don't deserve to have competent developers keeping them in business.

Our updates consist of keeping track of each file modified manually, then uploading them w/ FileZilla. We don't even have a version control system because "checking in and out is too much of a hassle"... :/

Anyone have any advice or suggested reading for getting our setup a little more versatile?

your current method sounds like more of a hassle than any of the version control systems.

You have my vote, however I cannot convince the rest of the department.

I think the biggest reasons we use it are1) Not wanting to use command line to checkin/out files2) We don't run apache servers on our own PCs - instead we just have 1 development site we all share3) "we've always done it this way" syndrome4) What would we even do to push the latest version out to our live servers? rsync the 'current' codebase or something?

Are there any source control methods that play well w/ 1 and 2? If it was GUI-based checkin/out that might go over easier.

1. If you're on Windows, there's TortoiseSVN, TortoiseHg, and TortoiseGit for SVN, Mercurial, and Git, respectively. They all work pretty well, especially for just checking out projects, branching, and other common tasks. If you're doing something uncommon or advanced, you might have to dip into the command line.

2. Why not? That would be a huge step toward making it easier to test. However, if you aren't going to do that, you could run Virtual Machines. How do you handle sharing it now? Odds are, you could share the same way.

3. Tell them to fuck themselves. Either that, or simulate some kind of catastrophe, which will prove your point. However, make sure it's only simulated, and that management knows what's going on, otherwise things could get ugly.

4. The server has a version control client, and "checks out" the latest version of code. Sometimes the server only has a copy of the "Trunk" or a "Production" branch checked out, and when you're ready, you push your changes into that branch, and then the server does an update. Of course, while you're starting, you can continue doing things the way you are now, and slowly introduce the better way of doing it.

No matter what you do, if you are going to stay there (and given the sound of things, if they're not willing to change, I wouldn't), start using Git or Mercurial yourself on your computer. It gives you version control over the stuff you've done at least, and provides a bit of a backup.

Our updates consist of keeping track of each file modified manually, then uploading them w/ FileZilla. We don't even have a version control system because "checking in and out is too much of a hassle"... :/

Anyone have any advice or suggested reading for getting our setup a little more versatile?

your current method sounds like more of a hassle than any of the version control systems.

You have my vote, however I cannot convince the rest of the department.

I think the biggest reasons we use it are1) Not wanting to use command line to checkin/out files2) We don't run apache servers on our own PCs - instead we just have 1 development site we all share3) "we've always done it this way" syndrome4) What would we even do to push the latest version out to our live servers? rsync the 'current' codebase or something?

Are there any source control methods that play well w/ 1 and 2? If it was GUI-based checkin/out that might go over easier.

Created an account with Ars purely to reply to you. I've seen another guy or two say it, but TortoiseSVN for windows is very, very good. I was of the stubborn "pfft, version control, too much bureaucracy and pain in the ass" mindset in days gone by, but now would not want to develop code without it. These things integrate with windows explorer seamlessly, even to the level of modifying the file icons so you know which ones have changed since your last "sync". Here's hoping you manage to convince the suits to get it set up

I'll add a little about what we do - we have a machine (let's call him "dev server") who runs linux (centos, i believe), and has a folder (called your "working folder") for each of our developers. These are made available over the network, and each developer, running windows or macos, then maps a network drive to their working folder on dev server. Any time a developer needs to start contributing to a specific project, they "check out" the existing project's SVN repository to a new sub folder in their working folder (as an aside, the setting up of project repositories can be a bit niggly). Dev server, naturally, also runs apache, configured with virtual hosts, so anyone on the local network can see any developer's version of any project they're working on by visiting http ://dev_server/developer_name/project_name/ and it All Just Works. Everyone's files are in one folder, can easily be backed up, yet each has their own distinct copy to develop on as needed, and it's all visible via the web server all the time.

Thanks for the comments guys. I think I should point our that are dept is small, only 3 people. So we don't have different teams. Also, I do like my job and I'm not going to quit over something silly like not liking how we do code updates.

s73v3r wrote:

... 2. Why not? That would be a huge step toward making it easier to test. However, if you aren't going to do that, you could run Virtual Machines. How do you handle sharing it now? Odds are, you could share the same way.

Why not run a separate apache instance on each developer machine? That would be a huge pain in the ass that's why. For one, our servers are linux/apache/mysql based, whereas our development PCs are running windows 7. So we could either run Windows apache and risk that not meshing up w/ our live server, or we put up w/ the hassle of a VM. If we get past that, we still have to worry about putting data in our dev database on each and every box/VM, and occassionally updating it (sometimes we push live data down to our dev box when the data gets stale). Then there's the issue of if we have to install a new Perl/PEAR module, we'd have to do it for every developer box.

Add on top of this, we frequently like to show off the features to management. They're not going to remember the IPs of individual machines to go in and test something. Having 1 development server makes showing off new features easier.

s73v3r wrote:

4. The server has a version control client, and "checks out" the latest version of code. Sometimes the server only has a copy of the "Trunk" or a "Production" branch checked out, and when you're ready, you push your changes into that branch, and then the server does an update. Of course, while you're starting, you can continue doing things the way you are now, and slowly introduce the better way of doing it.

That's an interesting approach. Any recommended reading for learning more about it? What about handling MySQL schema/data updates along w/ pushing out a new version? Any fancy way to do that?