Digg PHP's Scalability and Performance

by Brian Fioca

Several weeks ago there was a notable bit of controversy over some comments made by James Gosling, father of the Java programming language. He has since addressed the flame war that erupted, but the whole ordeal got me thinking seriously about PHP and its scalability and performance abilities compared to Java. I knew that several hugely popular Web 2.0 applications were written in scripting languages like PHP, so I contacted Owen Byrne - Senior Software Engineer at digg.com to learn how he addressed any problems they encountered during their meteoric growth. This article addresses the all-to-common false assumptions about the cost of scalability and performance in PHP applications.

36 Comments

Paul Dixon
2006-04-10 13:33:54

3 webservers with *8* database slaves? Information on how this configuration was arrived at would be interesting, as well as the run-time management of this setup...

Geoffrey L. Wright
2006-04-10 14:03:34

Great article. Hopefully it helps dispell the "scalability" myth. I get to hear this one all the time. Of course it's not only aimed at PHP -- you also hear the same argument applied to Python, Ruby, Cold Fusion, Classic ASP, etc. And it usually comes from the Java / .NET crowd. (I don't know why -- maybe it's just in defense of all the extra code these guys need to write! No, just kidding... JOKE! IT'S JUST A JOKE!)

And it obviously isn't true. Unless you're blind you can see large and complex applications written in almost any popular scripting language: PHP (Flickr), Python (Google), Cold Fusion (Lockheed Martin E-STARS®), and Classic ASP (Dell until recent times). And of course you see plenty of great apps on the web written in both Java and .NET.

This isn't a Java or .NET bash. PangoMedia (my company) develops in Java, .NET, Python, Cold Fusion and even Visual Basic. We also happen to do a lot of PHP development using Brian's excellent WASP framework. And in seven years of working with all of these technologies we've never had a language-related performance bottleneck. That's not to say that we haven't had to work on performance turning in same cases. But it's never been a big issued. And the better part of this sort of work tends to happen at the database layer anyway.

I do have one small bone to pick with this article, however. I suggests the common MySQL MyISAM good / InnoDB bad misconception I often see these days. MySQL is somewhat unusual in that it offers multiple storage engines. Many developers seem to get confused by this. These storage engines are optimized for different tasks. InnoDB is certainly a better general-purpose table type, but it doesn't allow for the use of FULLTEXT indexes. This feature is a critical component of MySQL scalability and it's really one of the best arguments for using MySQL in the first place. It's both easy to use and insanely fast.

Anyway as said before -- nice article Brian. I just just wanted to point out that one small error.

Geoffrey L. Wright
2006-04-10 14:04:56

Great article. Hopefully it helps dispell the "scalability" myth. I get to hear this one all the time. Of course it's not only aimed at PHP -- you also hear the same argument applied to Python, Ruby, Cold Fusion, Classic ASP, etc. And it usually comes from the Java / .NET crowd. (I don't know why -- maybe it's just in defense of all the extra code these guys need to write! No, just kidding... JOKE! IT'S JUST A JOKE!)

And it obviously isn't true. Unless you're blind you can see large and complex applications written in almost any popular scripting language: PHP (Flickr), Python (Google), Cold Fusion (Lockheed Martin E-STARS®), and Classic ASP (Dell until recent times). And of course you see plenty of great apps on the web written in both Java and .NET.

This isn't a Java or .NET bash. PangoMedia (my company) develops in Java, .NET, Python, Cold Fusion and even Visual Basic. We also happen to do a lot of PHP development using Brian's excellent WASP framework. And in seven years of working with all of these technologies we've never had a language-related performance bottleneck. That's not to say that we haven't had to work on performance turning in same cases. But it's never been a big issued. And the better part of this sort of work tends to happen at the database layer anyway.

I do have one small bone to pick with this article, however. I suggests the common MySQL MyISAM good / InnoDB bad misconception I often see these days. MySQL is somewhat unusual in that it offers multiple storage engines. Many developers seem to get confused by this. These storage engines are optimized for different tasks. InnoDB is certainly a better general-purpose table type, but it doesn't allow for the use of FULLTEXT indexes. This feature is a critical component of MySQL scalability and it's really one of the best arguments for using MySQL in the first place. It's both easy to use and insanely fast.

Anyway as said before -- nice article Brian. I just just wanted to point out that one small error.

Brian Fioca
2006-04-10 14:32:24

"3 webservers with *8* database slaves? Information on how this configuration was arrived at would be interesting, as well as the run-time management of this setup..."

From what I found, the typical way to scale MySQL databases is to initially start with 1 master DB, and then replicate to a growing number of slaves. The master handles all writes, and replicates down to the slaves, which can be used to load balance reads. There are obvious shortcomings to this, the main one being there isn't a good way to load balance writes.

What database platform are they moving to? Why are they moving from MySQL?

Jason Carreira
2006-04-11 10:36:32

You and I have different definitions of "large scale applications" obviously. I wouldn't consider any of those "large-scale". Furthermore, anything that can be "rapidly built and maintained on-the-cheap, by one or two developers" would not meet my definition of "large scale".

Maybe they handle a lot of traffic, but their business requirements and level of functionality are dead simple.

Brian Fioca
2006-04-11 10:44:16

"You and I have different definitions of "large scale applications" obviously. I wouldn't consider any of those "large-scale". Furthermore, anything that can be "rapidly built and maintained on-the-cheap, by one or two developers" would not meet my definition of "large scale".

Maybe they handle a lot of traffic, but their business requirements and level of functionality are dead simple."

Fair enough. Certainly Digg and Flickr don't represent large scale "enterprise" applications, however it doesn't seem too much of a stretch to say that it would be feasible to write one in PHP, considering that it is in fact scalable.

Mark M
2006-04-11 17:07:41

Jason, I think you missed the point. Scalability != "large scale".

PostgreSQL
2006-04-12 00:18:33

It seems strange to me that PostgreSQL was not mentioned when talking about a scalable DBMS. Certainly for a big site like Digg it would be wiser to use it rather than using MySQL et al.

Of course ORACLE is a prime candidate, but with a very small improvement over PostgreSQL. Mostly in the feature department and less in the scalability department. Plus, the dough you have to produce just by looking at ORACLE.

sovandeulv
2006-04-12 01:28:28

Normally you would put up a PHP page which access mysql and then hit it with a stresstester. A simple one, which would do the job, is apache benchmark. Much easier and faster than to go interview digg.com. But of course then you would not be able to write an article about it and mention your own site a couple of hundred times in your own article.

Hareballa
2006-04-12 01:46:52

This must be the most obvious and rotten plug I have seen in a long time for a new web-site. Especially considering that you wrote WASP yourself. This is low man! And the article sucks.

RJ
2006-04-12 01:59:07

Interesting article :)
At www.last.fm we use Memcached, a distributed memory object caching system that has apis for many languages, including PHP and Java. Worth a look if the bottleneck is the database. You can reduce your database reads by 90% easily, provided you have a clean API in the first place..

Roy
2006-04-12 02:35:02

I am a PHP developer whos been plagued by the 'scalability' question and constantly being looked down by arrogant Java "enterprise" developers.
Throughout the years, I have asked myself many times if PHP is up to par and if it really do have scalability issues.
After reading some documents and noticing that some huge sites are PHP based (e.g. riteaid.com, flickr.com), I felt much better and I was confident enough to brush the 'not scalable' argument aside.

If only I can find a way to handle those Java snobs.
:(

Jon Daley
2006-04-12 03:02:32

Does PHP run reliably with apache worker threads? I thought that wasn't supported?

web design london
2006-04-12 05:49:18

Totally great article. I thouroughly enjoyed it.
Keep up the great work all.

Sam
2006-04-12 08:34:57

Very interesting article. You might also like to explore what applications other Digg Tools are using.

Brian Fioca
2006-04-12 10:17:50

"Throughout the years, I have asked myself many times if PHP is up to par and if it really do have scalability issues.
After reading some documents and noticing that some huge sites are PHP based (e.g. riteaid.com, flickr.com), I felt much better and I was confident enough to brush the 'not scalable' argument aside.

If only I can find a way to handle those Java snobs."

Speaking as a former "Java snob", it's a shame that it has to work that way. Experience has made me much less likely to engage in a my-language-is-better-than-your-language debate. Frankly, such arguments are childish and miss the point. Languages are a tool for getting work done, and you should always pick the best tool for the job at hand (despite how painfully cliche that sounds).

If I were writing an application that needed to do intensive reporting, I'd proably use a language like Java. I might even use a PECL extension to have PHP call the Java code.

2006-04-12 11:00:08

dont forget Yahoo! - they are in a php migration from Aapche C modules and get far more hits then digg.

mfa
2006-04-12 15:14:12

Hi. You mentioned Digg is moving to a new db platform. Anyone know what platform?

Brian Fioca
2006-04-12 15:18:14

"Hi. You mentioned Digg is moving to a new db platform. Anyone know what platform?"

When I interviewed Owen for the article, they were considering the idea. I have since emailed him to get clarification and he replied that they are now planning on staying with MySQL, but changing their architecture to use memcachd and "shards".

I'm thinking about doing research for an article that goes into detail about what database architectures work best for these sorts of PHP applications.

A
2006-04-19 12:02:35

Jobby is a waste of space.

Santosh
2006-04-27 23:50:48

Thank you for putting effort into this. This is a great article.

(newbie question) I am also interested in another form of scaling. If I were writing a web application serving data in multiple presentation formats (for example HTML, WML) would PHP still be a suitable platform? Are there any other businesses using PHP in a similar situation.

Brian Fioca
2006-04-28 09:16:37

"(newbie question) I am also interested in another form of scaling. If I were writing a web application serving data in multiple presentation formats (for example HTML, WML) would PHP still be a suitable platform? Are there any other businesses using PHP in a similar situation."

Yes, provided you use a solid MVC architecture. WASP, for example, uses a template manager to handle its UI, and the choice of which template to use can be made at runtime. You could have a set of HTML templates along side a set of WML templates and choose which one to render based on the URL.

Cary Harper
2006-05-01 09:47:25

This was a nice article and I enjoyed reading it, however, I would like to know how well you think PHP compares to Java when dealing with issues like:
1) Keeping multiple data centers in sync with each other, in real time
2) Dealing with complex access control systems
3) Dealing with complex ecommerce systems

I don't doubt that PHP is scalable or capable of performing well at the 2 million hits per month range on a simple web application, but I do have concerns when it comes down to the available tool sets to handle enterprise level problems in PHP versus J2EE.

Lee
2006-07-28 02:25:50

I'm very curious why they are moving away from MySQL? Any update is appreciated.

Txarly
2006-08-29 03:42:31

Sorry guys, what dou you mean by 'shards'?? I'm not that good
at techie slang ;D

In my experience of C++, Delphi (pascal) and VB, PHP, as a language, is a jolly awkward customer. It's got so many nooks and crannies that it's barely usable in my opinion. At least C++ makes a token nod at simplifying. Interestingly MySql is exactly the same: riddled with idiosyncratic nonsense. They are a perfect match for each other, even if the result is hell-like. Granted they are accessible to beginners, unlike Java or many other languages. Perhaps performance-wise PHP scales, but I can;t beleive that they would scale language-wise (ie. for big projects).

However the author of this article gives himself away when we find out that he didn't know how to optimise PHP in order to 'compile' only once. I can't trust his opinion. It's a good job he referred to someone with more authority, but he has no ability to evaluate what the guy has said and then to give us an interesting and informed opinion. Ah well.

In anycase its actually all the fault of ducks. That's right. If you didn't know then you certainly should. Kill those ducks!

woyaokan
2007-01-21 09:27:26

http://www.javatag.com find php doc by javadoc styles

Luis Colorado
2007-02-26 07:58:50

I'm under the impression that the discussion about what language is faster/better/+scalable are pointless, because, as pointed out, the bottleneck is in the database (i.e., disk access).

As you can see at any pure processing speed benchmark (http://www.timestretch.com/FractalBenchmark.html), C is about 10 times faster than Java, and Java is about 30 times faster than PHP.

At this point, processors are so fast that it doesn't really matter anymore what kind of language you are using. You could even use assembler or C, and I suspect that you would not gain in performance/scalability, why? Because faster languages give you gains of microseconds, while your real bottleneck is in the database I/O.

That means that even "slow", interpreted languages like Ruby and PHP perform quite well doing database processing.

This article provides a good lesson, but the value of a language is related to its purpose. Seems to be that this days the main problem is not execution-scalability, but "development scalability". When the problem size and complexity increases, does the development time go linear or exponential? Quite frankly, I don't know enough to give an answer, but seems like this is the problem that at this point we need to worry about.

Please consider either fixing or removing the links to Large Scale PHP and High performance PHP referred to in this article so they work in IE. There is no reason other than some obnoxious developer making things difficult for these links to work the way they do. Those of us that actually make our livings providing web based applications to cusomters work in IE because outside the developer community 99% of users use IE.

Michael Hosting
2007-08-13 13:11:27

"Hosted Linux server" - does that mean the one that you can get from any dedicated hosting company? How do you move to your own multiple servers at a different datacenter without shutting down the site?
Resume tracking application with bunch of searches for the information appears to be much more aggressive on PHP and DB resources then digg with their get/update DB entry behavior. It is interesting to know the further developments with Jobby.medvegonok

kvz
2007-10-06 09:00:25

A thorough article to increase the performance of apache & php:
http://kevin.vanzonneveld.net/techblog/article/survive_heavy_traffic_with_your_webserver

Sign up today to receive special discounts, product alerts, and news from O'Reilly.