There are two ways to develop software: the wrong way and the fun way.

Tuesday, August 02, 2011

Interesting notes for a Rails 3.1 performance issue on an EC2 micro instance

For my web app, I recently switched to an Ajax intensive model where most of the html manipulation happens on client side with coffeescript. This significantly increases the number of http requests hitting the server, which caused really serious performance problem on my micro instance on EC2. The app reacts quickly for the first page but then would literally freeze for 1-2 minutes. It turned out to be a combination of the way how Ajax app, Passenger and EC2 micro instance works.

With default configuration, Passenger tends to keep a low number of instances in the pool and spawn when it needs to take care of higher volume of requests. It also recycles unused instance regularly (every 5 minutes by default). It works perfectly with the one request takes care everything model, but an Ajax app tends to fire more ajax calls and they usually come in spikes. Passenger spawn more instance with the assumption that these requests are coming from multiple users and it needs to guarantee a reasonable response time for all of them. It's a waste for an Ajax model because the asynchronized nature of Ajax application allows them to function reasonably without a super fast server response on every request. Spawning extra instances becomes an overhead when serving a small number of users. Because the normal pattern for an Ajax application is that passenger sees a burst of requests and spawns a bunch of instances and then recycle most of them when user stops operating the app for a bit. Spawning apps is a CPU intensive operation.

On EC2 micro instance this became disastrous because these instances throttles your CPU after a cpu usage spike to prevent you from using 100% CPU all the time(they are shared). The consequence is that every time Passenger spawns 6 instance (default config value), EC2 'helps' by throttling you CPU to only 2.7% of normal computation power...

My solution to this is that I fixed the number of instances in the passenger instance pool at a lower level. This way, ajax calls might get queued a bit from time to time but 1) hardly felt at the client side because most of the ajax calls are some proactive data pre-fetching and 2) no CPU usage spikes.

The lesson I learned from this is that a lots of rethinking is needed in many aspects of web app when we switch to the Ajax driven model.