/dev/urandom

PHP Cloud Management: Orchestra vs. PHP Fog

This morning (Mar 22, 2011) I got into my work inbox an email announcing me that my Orchestra invitation is finally accepted. I already tried PHP Fog last week, therefore just Orchestra was still pending on my @todo list.

I am evaluating these platforms in order to see if either one can ease my sysadmin burden. Well, so far, as with any cloud service promising the next best thing since sliced bread, I got mixed feelings. For the moment I tested just the free tier of both platforms. While Orchestra uses a pay-as-you-go model, PHP Fog asked for $29/mo in order to drag that number of servers slider. That’s a little bit much for a plain evaluation aka not so cloud-ish model where you don’t need upfront investments.

So far, the PHP Cloud Management is still in its infancy, therefore there’s still room for improvement. Now let’s get the geeky talk begin.

Server Setup

Both platforms use Ubuntu Server. Both platforms have a little bit outdated kernel image as well. 2.6.32-314-ec2 is the latest published by Canonical. So far, this is the only common link between the two, besides the AWS cloud as base service. I would like to see better OS maintenance though since Canonical are pretty active into this ‘cloud’ business.

Orchestra may use nginx or Amazon’s Elastic Load Balancer. The load balancer is not a choice, but a matter of application type. If the management is done right by Orchestra, ELB shouldn’t have significant issues. My experience with ELB is still mixed with its ‘undocumented behavior’, while Amazon’s premium support is utterly useless. While still using the ELB, the whole experience isn’t at the level advertised by Amazon.

PHP Fog uses nginx as the sole load balancer.

Cache layer

Orchestra uses nginx as proxy cache. Don’t confuse it with the load balancing layer. While nginx is a pretty fast server, it lacks any HTTP 1.1 support for the proxy backend. Validating the content with If-None-Match and If-Modified-Since is impossible for HTTP 1.0. In plain English: the static content is cached by Orchestra for eight hours. A static object into the cache layer can not be invalidated. You have to use a versioning schema in order to invalidate a static object.

PHP Fog on the other hand uses Varnish which as dedicated proxy cache server it can do a far better job for invalidating the cache. Varnish has a single disadvantage for the empty cache experience when loading large objects. That may induce some latency as Varnish doesn’t provide unbuffered transfer from the origin server, yet. However, this it’s planned for future releases. Since about 90% of web traffic is small objects, this shouldn’t be an issue most of the times.

Application Server

Orchestra uses a setup that is remarkably close to the preferred production setup: nginx + PHP-FPM. They maintain their own up-to-date PHP build (5.3.6). This is a really good thing since maintaining PHP, besides choosing a plain Zend Server CE installation, can be a word-class pain in the butt given an appropriate number of servers to manage. The security setup is almost flawless. While they use open_basedir restrictions aka the application is contained into a single directory, allow_url_fopen is on. This goes round the open_basedir restriction. Suhosin isn’t installed. As sysadmin, I tend to sleep a little bit better with suhosin being around and allow_url_fopen being off. All the process creation functions are disabled. The dl() function is off. This whole setup gave me the impression that it was done by my taste. Well, almost. The Orchestra free tier has severe limitations, but it gave me the big picture of their production setup. Their KB specifies that the basic and elastic tiers won’t display the errors, the same as the free tier does. The free tier comes as a single process PHP-FPM pool, therefore the concurrency is very limited. I got around 4 requests / second for a simple application over an Internet connection with 150ms latency. Also the free tier doesn’t have any opcode caching while the basic and elastic have APC. They quote ‘security reasons’. I won’t argue about this since PHP-FPM uses a shared opcode cache across all process pools so they may be right, but I didn’t identify any issues with PHP-FPM and virtual hosting setups. I also found pretty strange that they use nginx 0.9.6 aka the development version, not the stable nginx branch. While Igor writes a decent web server, usually I would take a more conservative approach when launching a production setup aka using stable, not development versions of a certain product. However, nginx does have a pretty good security record.

PHP Fog uses a plain old Apache 2.2 + mod_php5 setup. Even for the time limited free tier, you get a full t1.micro instance all for yourself and your application. Well, you used to get since PHP Fog is down for maintenance due to recent severe security issues. Unfortunately, they are using the PHP version shipped with the Ubuntu Server 10.04 LTS distribution aka 5.3.2 which might give you a couple of headaches from time to time as the Ubuntu PHP distribution is pretty badly maintained. Unlike Orchestra, the setup doesn’t provide the same level of well though configuration, but it does support more legacy apps by default. The code optimization / opcode caching is done by the good old eAccelerator. They also provide a seamless integration with New Relic which I think it is a neat addition to the stack.

Application Design

Orchestra is oriented towards shared-nothing architectures. Reading their KB, I felt again almost like reading my own brain dump. Had some severe scalability issues with shared architectures. I’ve learned a lot from that. It seems like these guys got the whole point of scalable web architectures as well. I would add something to that, though: ditch the sessions if possible. If you need state over a stateless protocol (HTTP), then use plain-old cookies. You don’t need large datasets into the sessions superglobal. Orchestra’s distributed sessions setup, for the moment, is pretty dirty, aka adding a couple of lines of code for memcached sessions. I guess they will add an option into the slick ‘Addons’ panel which enhances the platform functionality. You do have to follow the cloud-ish recommendations for application designing, while the legacy apps or existing CMSes should use something like S3 upload for the static objects, if a plug-in is available. Funny thing though, while the upload size in php.ini states 50M, the post maximum size is 20M. Go figure. I wonder what’s the maximum post size accepted by nginx. I tried a plain WordPress installation as well as a Kohana application installation with small bootstrap hacks in order to enable the Kohana class paths cache. They work flawlessly. However, due to the fact that there’s nginx and there’s no .htaccess and Apache rewrite rules, any request that doesn’t hit the machines hard drive is a query input for the ‘Index file’. In other words, plain old front controller for the application. You can not use ‘pretty URLs’ with applications that don’t follow the front controller pattern design. Well, most of the time, you shouldn’t use anything else besides this pattern design if you wan’t to make everyone’s life easier. I had to administer applications with over 200 rewrite rules of retardation bundle. Or, even more ‘funnier’, applications with front controller pattern design that use a bunch of rewrite rules instead of dealing the routing stuff inside the application. The web servers folks don’t love you people that write that junk. Seriously. However, you do have to handle 404 errors by yourself and stuff like this. The front controller pattern design by default comes with some limitations. Anyway, YMMV.

PHP Fog on the other hand, enables the whole stack of PHP web apps to be hosted on their platform due to more traditional approach. Since Varnish caches the static objects, the performance hit of the Apache setup for static objects is reduced. However, a shared disk architecture is required for some usage models. The PHP Fog guys state that they can provide that. Well, it isn’t rocket science, the cache layer helps a lot, but don’t expect it to scale the same way as a proper SAN does especially if you have frequent requests to the block storage. Caching does not solve performance issues if the traffic pattern has a low cache hit ratio over frequent access. So no, don’t think you can build the next Facebook using a shared disk architecture and t1.micro instances. Had some experience with a NAS setup over Amazon EC2 (GlusterFS based), but some platform limitations made us to use S3 as a more appropriate storage for static objects. Hint about the cause of the issues: EC2 instance network I/O limitations for both plain network operations and EBS requests. It’s pretty easy to fill a network pipe when you’re sharing the hardware node, while buying higher specs machines in order to get more network I/O is suboptimal. As usual, YMMV.

Besides plain MySQL, both platforms support additional noSQL approaches. So far, there are more PHP-side client plug-ins than available services.

Deployment

Orchestra doesn’t provide any code hosting, but it can deploy your app from external GIT or SVN repo. I had issues using a private SVN repo, therefore I ended up using a public GitHub repository for the whole testing phase. They use the pull deployment method by polling the repository for changes. Officially this is once per minute. While it may simplify things a little bit, this isn’t the most efficient solution. I got pretty annoyed to sit and do nothing while Orchestra was pulling my new revision. Boy, the event vs poll great debate all over again.

PHP Fog on the other hand provide GIT hosting. It enables Heroku-like deployments ‘as simple as a git push’. They also have a GIT deployment tool (that currently does not work). Overall, I do like this (evented) approach more.

Conclusions

Both platforms still require a lot of polishing. The recent PHP Fog security issues can be a turn off for a whole bunch of organizations as some claims seems to be more inflated than the actual hard truth. While PHP Fog seems a more closer to complete product, Orchestra is more security and scalability oriented, closer to my own mindset. This comes with an administration and application design overhead though. However, people should finally get the idea that the systems administration stuff is for professionals. If you outsource this part, it better be done properly since skiddies won’t take it as an excuse for NOT taking your application off the Internet. I would recommend PHP Fog for legacy applications though if they manage to patch their code, machines, and reputation. However, scaling on one to five instances may not be an option for big-enough applications.

I would also recommend you to develop shared-nothing applications if you actually intend to scale, therefore Orchestra would be the obvious choice if DIY solutions aren’t applicable. I took a peek at the configuration files of the PHP Fog stack, but Orchestra provides a far more security oriented setup while the PHP stack provided by Orchestra is actually maintained, not outsourced to the Ubuntu’s ‘universe’ repository. I iterate the same idea as other do: writing scalable software doesn’t mean that it should be fast, it should also be able to run on a distributed grid. Most of today’s PHP apps are architecturally-challenged when it comes down to this. If you don’t like to get your hands dirty, these platforms may eventually help. The current state of affairs make the whole experience to be rough on the edges, but then again, we’re talking about a couple of private beta products.

Well, for example instead of a session, one could use an authentication cookie to identify the user. The logic behind user calls is light. In fact, unless an actual database write is required, the information is fetched from a hot cache instead of a session which might be difficult to scale. Or the information might be unreliable if the memcache backend is in use if the LRU policy kicks in. With our management the request falls back to the master database.

For a project like ours where the data changes seldom while the customization of the page when the user is authenticated is minimal, we use full page caching plus multiple layers of caching. The user information is loaded by a lightweight AJAX call in order to use the same full page cache for authenticated users.

The caching layers: memcache (local failover cache if the memcache cluster is down), nginx’s FastCGI cache, the CDN also caches some dynamic output that’s cached by nginx. The nginx / CDN caching of dynamic output is controlled via cache control headers. The CDN enables low latency delivery for popular objects. Just the nginx FastCGI cache alone brought a 40% drop in CPU usage. Even before the CPU load drop, the infrastructure was network I/O bound. Two machines (c1.medium EC2 instances) are enough to support the CPU requirements, but they can’t sustain the network I/O.

The user uploaded data doesn’t go straight to disk. We had a really hard time with scaling shared disk architectures on EC2. The biggest threat: network I/O + latency. We had to buy bigger machines (up to m1.xlarge) in order to sustain the network I/O required by GlusterFS. And believe me, I had three months of trying every speed hack in the GlusterFS book. Therefore, the user data goes to S3.

I Read

My Open Source

My other blog

Meta

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.