I expect the CPU usage to shoot up with the higher number of users. But instead the CPU usage stays between 20-30% (which is why the new instance never fires up) and running instance starts throwing timeout errors once it reaches anything more than 100 users.

I am at a loss to understand why CPU usage is so low when the website is in fact timing out.

2 Answers
2

This could be a problem with the ELB. The ELB does not scale very quickly, it takes a consistent amount of traffic to the ELB to let amazon know you need a bigger one. If you just hit it really hard all at once that does not help it scale. So the ELB could be having problems handling all the connections.

Is this SSL? Are you doing SSL on the ELB? That would add overhead to an underscaled ELB as well.

I would honestly recommend not using ELB at all. haproxy is a much better product and much faster in most cases. I can elaborate if needed, but just look at how Amazon handles the cname vs what you can do with haproxy...

Can you please elaborate more on how haproxy can be used with EC2 instances? All my reading about AWS auto-scaling has been based on ELB only...
–
JimmyFeb 2 '12 at 9:05

And no, this is not SSL but I was thinking of adding SSL in future?
–
JimmyFeb 2 '12 at 9:07

Well, just check out the haproxy page. It is a really great program. You can use cloudwatch to set alarms on your app servers behind the haproxy servers, and auto scale just the same as with a ELB.
–
chanthemanFeb 3 '12 at 17:47

The setup would be something like, haproxy in the front with apache or nginx or something, that serves static content right back to the user, send php content to an app server behind haproxy server, which routes content to database server or memcache server or whatever. The point is you can still use autoscaling because all it does is monitor CPU on a given server and boot up a new AMI. You would have to add the IP to the haproxy config file somehow... I think you could script that as well, but that is more complicated.....
–
chanthemanFeb 3 '12 at 18:04

It sounds like you are testing AutoScaling to ensure it will work for your needs. As a first pass to simply see if AS will launch a new instance, try reducing your CPU up check to trigger at 25%. I realize this is a lot lower than you are hoping to use moving forward, but it will help validate that your initial configuration is working.

As a second step, you should take a look at your application and see if CPU is the best metric to have AS monitor for scaling. It is possible that you have a bottleneck somewhere else in your app that may not necessarily be CPU related (web server tuning, memory, databases, storage, etc). You didn't mention what type of content you're serving out; is it static or generated by an interpreter (like PHP or something else)? You could also send your own custom metric data into CloudWatch and use this metric to trigger the scaling.

You may also want to time how long it takes for an instance to be ready to serve traffic from a cold start. If it takes longer than 60 seconds, you may want to adjust your monitoring threshold time appropriately (or set cool down periods). As chantheman pointed out, it can take some time for the ELB to register the instance as well (and a longer amount of time if the new instance is in a different AZ).