Faster PHP fo shizzle—HipHop for PHP

Scaling Facebook

(This was given mostly by David Recordon, followed by Mike Schroepfer, and one other)

Facebook is big.

Really big.

They stated they are the #2 site on the internet with 350 billion monthly page views.

This works out to about 350 million active users, 4 trillion feed actions, and 1 million application developers. Or, put another way, they are the largest photo sharing site with 35 billion photos (x4 different resolutions stored for each photo) serving an average 1.2 million photos per second to their user base.

Let’s look at some optimizations they’ve done to the engineering architecture to reach these heights.

Haystack

In the cases of photos, a problem they’re very proud to showcase, you have a huge complex problem. First the photos are multi-homed (that means they have to have copies of the photos in both West Coast and East Coast datacenters), and I’ve already outlined the traffic for those photos above.

If you look at their numbers, the sheer number and variety of images means you can’t depend on a CDN to solve your problem.

So, Facebook invented a new way of retrieving photos they call Haystack. To see why, consider this table

For reference, almost all websites you know just fit their images on regular disks. If they have scaling issues, most of them use NFS to scale those disks out. If they run into issues with that, as Tagged did, you use a costly appliance known as a NetApp, and then spend your time optimizing the file structure to get it going at top speed. In Facebook’s case you write your own. Since number of photos, not size, is the bottleneck here, that second column basically represents the inverse upper bound of speed per unit hardware. It has the added benefit of being its own webserver, so it doesn’t need a separate hardware allocated to actually serve the images.

Newsfeed

“The multi-dimensional social graph.”

Bon Jovi in Facebook
Facebook, Palo Alto, California

Sony DSC-WX1
1/30sec @ ƒ2.4, ISO250, 4.3mm (23mm)

Conference rooms at Facebooks new HQ are named after 80’s bands. The "2D.03" naming convention stuff is strange. One bathroom I saw said it is "2H.02." Since everything looks alike in their new building I guess that’s how you can tell them apart when you tweet where you last peed.

The whole place had the feel of a junior-high school—I wonder where the gynamsium is. Apparently it was an HP building. (Googleplex used to be SGI.) Sounds about right—Silicon Valley builds on the bones of their predecessors.

Facebook is famous for the newsfeed: a scroll of data showing what you or your friends are up to. When you think of what this takes from an engineering perspective, you realize the difficulty of the problem due to interconnected data. The estimate they gave is if 1% of their users are active on their website at a given moment, then 90% of the user dataset needs to be available due to the fact that the users are interconnected—like displaying thumbnails and links to friends as users browse profiles.

They call that last bit the “multi-dimensional social graph” which is Facebook buzzword coinage to mean: in social networks, people interact with other people’s data (and not just their own, or each other).

This “multi-dimensional” problem is there for all social networks—it’s the nature of the business. At Tagged, I got around it with a simplified privacy model, minimizing this sort of computation, caching this computation, and through other shortcuts.

In order to solve this, Facebook creates a special set of Newsfeed servers that store an in-memory queue of recent events across a cluster of nodes, called “newsfeed leaf”s. When it needs to put together a newsfeed for something, a call is made to one of those nodes, at that moment called an “newsfeed aggregator.” The aggregator makes multiple calls to all the other leafs and then aggregates and organizes the data. It is then decorated by PHP—like turning a user ID into an image and a link to their profile—for output. It has to do this every time and uniquely in order to handle both the timeliness of the data and the complicated privacy model Facebook has. The only caching that goes on is in the user’s browser through the use of Javascript.

Memcache

“Nowadays, disk is the new tape.”

I’ve said this many times over the years, without memcache, we’d be dead. The analogy I like to use is that think of a modern website like a really big personal computer. In its architecture, PHP is your processor, gluing functionality together. Now look at your memory system. You have disk, which is the database, you have the in-processor L1 cache (PHP process RAM) and L2 cache (APC user cache). In this model, your RAM is memcache.

Facebook uses a slightly different analogy: “Disk is the new tape.”

What this means is the same thing: your database is disk bound and slow. You only want to use it for archival purposes as much as possible. For active use, you need something else. For Facebook, the problem is especially difficult—social networks need a huge chunk of “RAM” active as mentioned above. Cache misses mean going to the database (disk in my analogy, tape in theirs) which is slow.

To give you an idea of how big the store is, in order to get a 98% hit rate on their memcache, they need 700 machines. This works out to 40 terabytes of RAM—probably the largest single memcache store around. For reference: a large social network like Tagged uses only half a terabyte of memcache, and many other startups can’t even afford that since the cloud-hosting they are using bills by the megabyte—nor is there a guaranteed minimal latency between slices.

This means that their memcache has to perform greater than 100 million operations per second.

more efficient serialization routine. (PHP uses a text serialize that is slow and generates an inefficient file size (for network traffic). Facebook’s version is bundled with the open-source Thift—<span title="[Ed: a friend noted that fb_thrift_unserialize is Facebook’s internal name for the serializer. A similar serializer also forms the basis of the Thrift protocol library.]" class="commentary">look for fb_thrift_unserialize.)

multi-threading version of memcache. (Before this was added to the memcache core, most sites would run as many instances as they had CPU cores, each with a slice of the RAM.)

improvements to memcache protocol

new network drivers for the machine. (These are custom network drivers written for each network card and machine combinations.)

compression

network stack optimizations. (One famous one was that the ethernet driver in the high end Intel servers had a bug where it only one core could do the networking.)

UDP support to memcache. (TCP/IP limits the number of simultaneous connections to around 250,000. If you have multiple webservers accessing multiple memcaches with multiple processes keeping the connection alive, you can hit this number. This is often implemented incorrectly elsewhere as “fire-and-forget” since the people writing memcache clients don’t understand why it was added. Tagged took shortcuts that handle 90% of cases, but Facebook’s is a full two-way UDP com layer for all dataset sizes—packets will need to be reordered on both sides.)

MySQL

Facebook chose MySQL because it is simple, fast, and reliable. It is their “tape backup” which means that they run MySQL across 6000 machines without data loss. For you database nerds they are using MySQL 5.084 on a Percona-build with some custom patches. [Ed: Per correction below, this is incorrect. You can check this website to see Facebooks patches to MySQL]. The filesystem is an XFS or Ext3, depending on the machine (they’re migrating to XFS).

As a company, they really haven’t focused much on database optimizations until recently. But here is a list of what they’ve learnt:

Logical migration of data is very difficult. (This is a known weakness of MySQL, actually.)

Create a large number of logical dbs, load balance them over varying number of physical nodes. (This is db speak for the fact they are scaled horizontally and vertically.)

No JOINs in production: It is logically difficult because data is distributed randomly. (This means they are horizontally partitioning the data.)

Data-driven schemas make for happy programmers and difficult operations.

Don’t ever store non-static data in a central db. (In other words, if any data is updated and not put in a partitioned data store, it’s asking to break as you grow.)

Use services or memcache for global queries. (Don’t do any count * or queries that have to go across every node in a horizontal partition, it won’t finish in time on a busy site.)

I think the real reason is legacy (Facebook was built in a dorm room), and it got too big and too late before people started considered anything other than MySQL. Scaling may have been done on the database, but optimizations were done elsewhere to alleviate database dependence. For better or for worse, the whole history basically reeks of the PHP development style of where the database is a commodity.

Multiple Datacenters

You can see how the technologies mentioned above are put together when scaling across disparate data centers.

Scaling across multiple datacenters
Facebook, Palo Alto, California

Sony DSC-WX1
1/80sec @ ƒ4.5, ISO200, 13mm (73mm)

This diagram, photographed from one of their slides, should make it easier to understand Facebook’s three datacenters: SC = Santa Clara, CA; SF = San Francisco, CA; and VA = Fairfax(?) Virginia.

Facebook started with a single data center in Santa Clara. When power footprint become full, they added a dataceenter in San Francisco. Because these two places are physically close, latency is low and a memcache proxy can be implemented to make sure the dirty caches are updated simultaneously on both machines.

The issue is when a second, physically distant datacenter was built known as ECDC (East Coast Data Center). In that case, the latency over the network is high enough that a local memcached must be installed and it can be corrupted due to race if cleared via proxy. The trick Facebook came up with is to add a new SQL to the spec called MEMCACHE DIRTY that takes a list of keys that are dirty. Thus, when the application developer updates the database, they include a dirty request along with all the keys that are dependent on that row. Which this request gets replicated using native MySQL replication across dark fibre to the other datacenter. The dirty event is read by MySQL and passed to a local memcache proxy on the other side which then clears the keys if they exist on the local memcached there—the other memcacheds have been updated by its local proxies.

What this means is that Facebook cannot implement a write-through cache, but instead leverages the database (and cluster controls) as the arbiter of data consistency. When an object is updated, it is instead cleared from cache. The next request from that update asks the local database for the new data and the structure is rebuilt in memcache. Thus they trade off some speed for consistency—the C in ACID.

My blog post isn’t finished but I haven’t claimed “only if you’re big.” (yet). My claim is that you need over 2 machines and a bottleneck in the application server (which is rare).

To your point specifically, Harry, you are correct. If your latency gets degreased than this is good. But taking a real world example, before I joined one startup, "Hello World" took 240msec, a rearchitecture (without something as drastic as HipHop) dropped it to 15msec. I don’t think latency would be a win at that point alone. My guess is similar improvements can be found at other companies that are >90msec. However that is not always the case: Rasmus feels this may be a win for frameworks—their bloat usually destroys response time.

In Facebook‘s case, all the "big latency" hurdles were eliminated when they moved to lazy-loading APC, they are clearly thinking about eliminating even more with the ability to snapshot the core and restart from there (that that approach is highly complex). And their big issue is sheer cost, not latency. So to them total time matters and latency is simply a matter of running the servers sparsely.

Unfortunately I cannot quote Facebook’s numbers on CPU time. You will have to ask them, or figure it out yourself by the copyright trick.
My recent post Faster PHP fo shizzle—HipHop for PHP

Maybe I came off wrong in my comment but it was more a reflection of the hype and not the technology.

I shouldn't have acted so quickly to say that WordPress used eval all over the place because as you say that isn't correct. I was trying to find the least complicated reason to give for why people using WordPress shouldn't bother.

My last resort comment was based on there not being much more you can do to speed up PHP than compiling it to binary. If there were then Facebook would have done that instead. That isn't a bad thing but it is a reflection of what people will be using the tool for.

The hype around HipHop makes it out to be something everyone that uses PHP will be using (that is why you get comments like Patric's about how great it will be for WordPress users). You need to be committed to using PHP compiled with HipHop. I'm sure the hype will die down but what is worrisome is that people who don't understand what to use it for will fail in their attempts and complain about it.

On another subject, does anyone know when it is it going to be released? I want to actually use it so my comments are based on some reality.
My recent post Developing Adobe Air Apps with Linux

@Carson. Don’t take it personally. Right now your comment falls into the unfinished part of my article. This contains assorted splurgings that lack both tact and facts. Hopefully it won’t sound as nasty when it’s integrated in the final article.
My recent post Faster PHP fo shizzle—HipHop for PHP

In any case, it depends on the benchmark, but if the benchmark is artificial enough (like many of those in the Alioth shootout), then the static analyzer can replace nearly everything with native C++ calls. At that point, you’re basically benchmarking Java vs. C++, not PHP.

If you look at the same tables you reference, you’ll note that C++ does better in CPU usage than Java. Both C++ and PHP (native) already do miles better than Java in total memory usage (because of automatic garbage collection).

In practice, I’d say it puts them in the same class in terms of CPU—mostly slightly slower, but a few times much faster. This should come as no surprise because Java has a JIT and HipHop is a cross-compile to C++ which is a straight compile.
My recent post Faster PHP fo shizzle—HipHop for PHP

HipHop is interesting but, I'll definitely argue about it being PHP. By picking and choosing what language features they'll support they're building a language kinda sorta like PHP but not quite PHP.

Considering how many OSS apps and frameworks use eval() I also think it's disingenuous of them to characterize it as a rarely used feature. Now, maybe it's one that _should_ be rarely used but, that's a different argument.
My recent post HipHop for PHP is not PHP

I understand what you’re saying but it’s a losing argument. The frameworks you mention that can’t implement HipHop almost all because they depend on dynamic scripting of template pages for performance. That would no longer be needed in HipHop.

I’m not saying you are wrong right now, I’m just saying that it’s a lot easier to port frameworks than you think. They simply have to add a flag to allow you to turn off any dependency on dynamic scripting components like Smarty. They shouldn’t be necessary to run the base framework.

Before HipHop, there was no reason to not do dynamic scripting and a whole host of reasons why performance improves when you do. Now, HipHop changes that cost-benefit. To not expect framework developers (who I feel have as a failing their alacrity in which they adopt anything new), to change due to that is short-sighted.

Oh, I think that much of the OSS world will adapt and quickly. Supporting HipHop will likely become a checklist feature and looking at the usage of eval() in some projects it would be trivial to remove. I mainly cited them as part of taking issue with their "rarely used feature" characterization.

My larger point is that instead of actually supporting the PHP language, they're moving the goal posts to a position more convenient for them and calling it PHP. For better or for worse, eval() is part of PHP.
My recent post HipHop for PHP is not PHP

Oh, I think that much of the OSS world will adapt and quickly. Supporting HipHop will likely become a checklist feature and looking at the usage of eval() in some projects it would be trivial to remove. I mainly cited them as part of taking issue with their "rarely used feature" characterization.

My larger point is that instead of actually supporting the PHP language, they're moving the goal posts to a position more convenient for them and calling it PHP. For better or for worse, eval() is part of PHP.
My recent post HipHop for PHP is not PHP

I'm interested in what language features besides eval() are not supported. They give eval() as an example but imply there are others. Seems kind of important to be able to consider what will and will not be available before getting TOO excited…

I have a list of some which I’ll get to when I finish the article but here is a quick rundown off the top of my head.

– eval() not supported
– dynamic scripting is not allowed (That's where you use PHP to create a PHP file. Like when you use Smarty to compile a file).
– create_function() is not supported
– preg_replace when using e (execute PHP code on match)
– some functions are not implemented yet/were overlooked (An example was that was php_version() was not returning anything which was crashing HPHPi when it was running against the WordPress codebase. These bugs should be reported and fixed though.)

…and there was something to do with ordering where it works in PHP but won't when the static analyser hits it. Meaning in some of your scripts you may have to move things around for it to work.
My recent post Faster PHP fo shizzle—HipHop for PHP

hiphop won’t change the fact that php is a language most people grow to hate. i don’t know anyone who likes it more after a year than they did on day 1. so hiphop doesn’t make the rewrite argument go away. it might delay it, but inevitably the pain of actually writing and maintaining php remains.

I didn’t advocate when Friendster decided to switch from Java to PHP. I didn’t advocate when Del.icio.us decided to switch from Perl/Mason to PHP/symfony. I don't advocate anyone switch to PHP because of HipHop on PHP. Why would I start arguing that a company leave PHP because apparently according to your limited experience nobody "likes it after more than one year"?

Architecture changes are hard because they are inherently waterfall. They are especially hard since the web development cycle is tight (if the company is any good). If you want to shoot yourself in the head, (or if you are a consultant, cause your clients to shoot themselves) be my guest.
My recent post Faster PHP fo shizzle—HipHop for PHP

? you don't even know who the hell i am. i single-handedly created the most popular news website in the world, which has been #1 for a decade. you can piss on that too or you can admit maybe you and your cabal are by no means the last word in who knows how to build websites.

I started using PHP for side projects about 8–9 years ago. I started using it as one of my primary responsibilites at work about 4 years ago. While I don't recall exactly how much I liked it 9 years ago, I'm quite fond of it now.

Poorly designed code is painful, regardless of its language, and "the pain of actually writing and maintaining [code]" is part of software. I don't see how that's unique to PHP.

I started using PHP for side projects about 8–9 years ago. I started using it as one of my primary responsibilites at work about 4 years ago. While I don't recall exactly how much I liked it 9 years ago, I'm quite fond of it now.

Poorly designed code is painful, regardless of its language, and "the pain of actually writing and maintaining " is part of software. I don't see how that's unique to PHP.

It's a shame that so much time has been wasted creating a PHP to C++ cross compiler. Sure, it will help Facebook, and some other large websites in speeding up their systems, but it encourages more PHP usage, which is a downright awful language. PHP needs to die.

Also, your arguments about PHP being a more universally supported language than some other scripting languages is archaic. Shared hosting is approaching the end of its lifetime, and anyone who wants to create a Python/Ruby/Scala/etc website will be able to do so thanks to on-demand cloud computing.

Ahh, another anonymous comment with a blanket “PHP is bad” statement and no evidence to back it up. Did I get slashdotted and no one tell me?

Shared hosting may be EOL for those people doing Web 2.0 startups, but for the SME market it is not only alive and well, but thriving. The SME market is many orders of magnitude larger than Web 2.0 startups—talk to GoDaddy sometime before you make that claim. In fact, I’ve noticed that 3 of the top 3 open source CMSs (which pretty much own about 90% of the open-source CMS market) are written in PHP. Shared hosting was instrumental, and no amount of slicehosting will eliminate that, since slicehosting is not used by non web-based SMEs.
My recent post Faster PHP fo shizzle—HipHop for PHP

Are you sure about that? Phalanger is alive and well and being developed to a new version (3.0) by a UK Software company and a team at CHarles University. It's being deployed in the Enterprise and in Government.
My recent post Drumma Boy discography

Are you sure about that? Phalanger is alive and well and being developed to a new version (3.0) by a UK Software company and a team at CHarles University. It's being deployed in the Enterprise and in Government.
My recent post Drumma Boy discography

I also think that many developers will not use it, as many people say it only makes sense if you know what you are doing, if the problem is cpu/memory AFTER profiling the code, and if you have at least 3 servers and you can perhaps save one of them.
I'm really interested in the source code, examples, "compatibility lists", translated extensions and so on, this will take a while until we are able to use it I think.
My recent post HipHop für PHP

That is like the assertion that until 1994, COBOL was the language of choice, because it took that long for the new code to outstrip the legacy. And if you count ABAP/4 as COBOL, you could claim it came even later.

I too have a question about the 'only worthwhile if you're big' sentiment: wouldn't improved memory efficiency be very important to a site that's running on a tiny VPS? Or are the memory gains not really that substantial?

You are too pessimistic, HipHop PHP is going to change everything!!!! Be more excited and happy my friend, this enable developers to streamline other development concerns by increasing performance.

For instance, our development cluster in-house is larger than our production because we do massive testing (i.e. download the entire site and analyze it). This will be a great win for many development teams if harnessed correctly.
My recent post HipHop PHP is going to save the world