Tuesday, February 2. 2010

To paraphrase Marco Tabini if you work with PHP you must be doing so in a pretty deep cave to have not heard of HipHop for PHP and the fervor around it the prior to its official announcement this morning by Facebook.

I had a fortune to be part of the small group of PHP community people who were invited to take a peak at its technology prior to its official release in January. And I must admit it had been quite amusing to read some of the conjectures people were making about what it actually, given how off the mark most of their guesses were.

So what is HipHop?

In the tersest of terms HipHop is a tool that converts PHP code into C++ code that when combined with a PHP compatible engine and extensions (ports of some native PHP extensions Facebook uses) library also written in C++ can be compiled using GCC into a binary. This binary can then be ran on a command line or as a web server daemon that utilizes libevent. According to Facebook this can speed up applications by up to 50%, which is a pretty impressive improvement.

It is not entirely surprising that world's largest PHP deployment, such as Facebook would look at solution that would allow them to halve their not inconsiderable count of servers or double capacity. Releasing this solution as Open Source is I think a great idea, and big kudos to Facebook for doing so.

From a technical perspective the PHP optimization approach of converting PHP into a compiled language is not a completely new one, Roadsend compiler, a commercial product has been around for a few years now and has been doing that with some degree of success. That said it is not a trivial task and from an engineering perspective presents a fairly tricky development challenge, especially when you want to allow regular, off-the-self scripts to work. Perhaps more importantly, HipHop not a theoretical solution, "for you to test", it actually works, with most of the Facebook's servers running it and doing it well, on millions of lines of converted PHP code on daily basis, very impressive.

At this point you are probably thinking, that if it is so great and it works, I'll deploy it on my servers as soon as I can get my hands on the source code. Well, unfortunately things are not quite so simple, there are few technical and deployment challenges you need to overcome.

1] If you are using eval() and few other PHP features or want to use any of the PHP 5.3 fanciness you are out of luck, HipHop does not support those and while 5.3 support will probably happen sooner then later, eval() is unlikely to ever work. That said eval() == EVIL, so it is not a big loss, although a few templating engines depend on it.

2] To make HipHop do its magic you need to run a script to convert all of your PHP code and all of the supporting libraries to C++ and then compile it against what in effect is a C++ implementation of the Engine and supported extensions. This means anytime you need to make even most trivial of changes you will need to go through to convert > compile cycle. This btw is not an instant process and compiling things, especially C++ does take a good deal of time.

3] HipHop can only use PHP extensions specifically converted to work with HipHop, which Facebook done for the extensions they happen to use, if you need something else, you either need to convert the extension yourself, or HipHop is not for you. Since most PHP dev's probably don't have the time, skill or inclination to convert native PHP extensions to C++ variants to work with HipHop this can be somewhat of a deterrent.

4] The compiled binary can be run either on a command line or its own web server instance based on libevent, this means to use it you change how you serve content to your users, by putting a proxy or equivalent in place, so that PHP requests go to "php server" and the rest are handled via your web server of choice. Since your compiled code is effectively built-in into the web-server, it means any code changes in your PHP scripts will require a web-server restart as well.

5] HipHop does not work on Windows, since Facebook does not run its PHP on that platform and with good reason . That said it could probably be addressed with some effort, but the number of PHP Win32 devs who could do it, I can count on the fingers of one hand. So, unless MS decides to throw some resources that way, it is unlikely happen.

6] As PHP continues to evolve, divergence between "native" and HipHop functionality will definitely take place and continue to take place, remember HipHop does not use Zend Engine, it in effect has a C++ library that emulates its behaviour as far as majority of 5.2 functionality goes. This means there will be always things that work on PHP and not on HipHop and perhaps vice versa.

7] HipHop is written in C++ and converts code into C++, all of the PHP internals (extensions) and the Zend Engine is written in C and vast majority of PHP core developers are more at home with C then C++. This makes it challenging for people best equipped to take HipHop further do so, now that it is Open Source.

8] Security, while PHP perhaps does not have the best reputation for security as Steffan Esser will be happy to tell you, it is not bad by any stretch of the imagination and it is progressively getting better. This is largely the result of many people, looking at source code or just what PHP does and finding bugs and getting PHP developers to fix them, hence a massively distributed code review effort . HipHop while compatible with PHP, is really a separate implementation, with its own set of bugs and security faults, and while I am sure Facebook team done many code reviews and security audits, it does not really compare to man-years spent reviewing the "stock" PHP code.

Does this mean HipHop is bad?

By no means, as I've said before it is an incredibly impressive technical achievement that in the case of Facebook's use-case makes a lot of sense. It is just that 99.9% of PHP deployments out there do not share the same challenges as Facebook (massive, continually evolving code base and equally massive traffic), where the delivered performance increase would justify the HipHop's deployment and maintenance challenges.

Vast majority of PHP installs can gain the desired speed improvements through the use of Opcode Caching via APC, judicious use of caching via tools like memcache and good old fashioned database optimization. When profiled, most developers find that their PHP apps are typically hitting a DB bottleneck, rather then being slowed down by PHP itself. In rare instances where PHP is the bottleneck, it is typically easier to either optimize the code, or if that fails rewrite into a C extension, latter being even faster then HipHop, being native rather then converted code.

That said I think HipHop is a very helpful development for PHP community in general. For one it demonstrates that PHP can be converted/compiled into native code, which in the end is the holy-grail of performance for any scripting language. It is also Open Source, meaning people can learn from it, use some of its ideas or evolve the solution even further. Who is to say that HipHop may not one day evolve into a JIT compiler that can be installed in a form of an extension and deployed easily on ANY PHP installation. The released code also contains some other useful snippets such as some Facebook specific extensions and concepts that can be applied to the standard PHP installation, making the stock PHP (or some of its extensions) better.

From an immediate usability standpoint HipHop can be of big interest to authors of complex command line or daemon tools, that are now starting to be written in PHP, perhaps even GUI applications written with PHP-GTK. Since a lot of deployment challenges are simply not there and running a few scripts to convert and compile before execution would be of minimal inconvenience compared the obtained performance benefits. Also, given that those scripts typically use limited subset of PHP functionality they are also quite likely to work as is without the need to convert any PHP extensions to HipHop equivalents or come across un-supported functionality.

Excellent summary Ilia, as with Marco's. But as you say, if you have a DB bottleneck - no amount of C++ is going to save you vs. native PHP scripted code.

The question around extensions - to me it would make sense that in time an interface or cross-compiler tool could be added to do conversions of PHP extensions to HipHop compatible C++ code. A bit like phpize for HipHop.

Converting PHP extensions to HipHop is very much not trivial and probably cannot be automated. While HipHop extensions variants are not entirely different from native versions, they are fairly distinct.

I agree on your points. I'd probably look into HipHop as the last resort in app optimization. EAccelerator does it fine for small-medium size projects...gain of 0.005 seconds on every page load doesn't make too much difference.

One thing I've been wondering is: if it can run as webserver or command line, it would be easy to turn it into a fcgi app, thus making it much easier to integrate into existing web stacks?

As an aside, I always welcome new implementations of php not based on the zend runtime. Forks are not a major risk - just look at java - but multiple implementations just help strengthen the definition of the language, as many more corner cases have to be documented and turning the manual into a formal spec can only help forward/bacward compatibility for the official implementation...

Most of the sites do not need HipHop. In most cases bad sql query and too many tcp connection are the bottleneck. Only huge sites like facebook need HipHp and they have also the capacity to convert the php extensions to HipHop.

In response to comment #5, if I can half the time my users spend waiting for a HTML document, and I can automate the deployment process sufficiently, I think it's a worthwhile performance improvement.

The divergence issues are the greatest for me. If my application runs differently in development (PHP) than it does in production (HipHop), that's not much good. Increased testing, QA and development cycles will never be more expensive than the performance loss.

For those who run high traffic sites and have done their homework in the optimization of their PHP applications, HipHop would certainly be a interesting PHP implementation. Anyway, I'm excited to see HipHop's influence to the PHP world...

I believe that HipHop does use its equivalent to zval when variables change their type within their scope, so it may be necessary to refactor code a bit to ensure that variables are effectively strong-typed to take real advantage of this.
That's probably a good thing: weak typing might make PHP easy for new developers to learn because it is forgiving; but enforcing strong-typing is good parctise anyway, and can potentially provide additional benefits with the HipHop optimiser. To take full advantage of that, type hinting should probably be extended to cover basic data types (rather than simply classes) somewhere down the line.

PHP is already thread-safe, what is not thread-safe are some of the libraries various PHP extension may use. Which makes the installs that utilize them not thread-safe, the engine of PHP and "core" extensions are all thread-safe already.

> 2. ... This means anytime you need to make even most trivial of changes you will need to go through to convert > compile cycle...

So what about the interpreter FB mentioned in the announcement that gets around precisely this problem as I understand it, and that is currently being used by their engineers? Called something like HPHPi I seem to recall...