We've long known one of the virtues of the cloud is, through the magic of services and automation, that systems can be shut or tuned down when not in use. What may be surprising is how much money can be saved.

20% of their systems are shutdown after hours in response to traffic loads

Reserved instances are used for standard traffic

On-demand and spot instances are used to handle the elastic load throughout the day. When more servers are needed for an auto-scaled service, spot requests are opened and on-demand instances are started at the same time. Most services are targeted to run at about 50% on-demand and 50% spot.

Watchdog processes continually check what's running. More instances are launched when needed and terminated when not needed. If spot prices spike and spot instances are shut down, on-demand replacement instances are launched. Spot instances will be relaunched when the price goes back down. Spot capacity issues are rare and rarely are apparent to users.

Using this approach costs have gone from $54 per hour to $20 per hour

Only 2 weeks of engineering were required to build the system and with very little maintenance needed, it saves a lot of money.

Reader Comments (5)

It's a really effective technique. At a previous job we were catering exclusively to customers (who would pay for a subscription effectively) in Ireland, UK and Germany, which enabled us to be very specific about the times our service would be used.

Since it was for business use, we could run at full capacity for just 10 hours a day and drop down to running a handful of database systems with little compute capacity outside of this. Fantastic savings compared to running 24x7, which really matches typical IaaS pricing models.

Amazon is starting to stress these concepts in the AWS Architecture classes that they offer. They certainly aren't opposed to you paying for allocated capacity you aren't using, but the the more that their users start architecting solutions like this, the less expensive it is for Amazon to scale their infrastructure to handle their customer's loads.

Energy waste is also a hot button for me. Implementing a similar solution on 600 servers running at an animation studio, we found that 56% of their power usage was spent powering idle servers and could lead to financial savings. There were some challenges along the way, especially when it came to interfacing with some older IBM chassis infrastructure, but we were able to work through it. Visibility into incoming workload and server performance was especially important to keep them comfortable with powering machines down. Here is a link to the case study if you are interested http://tsologic.com/toronto-animation-studio-ready-to-save-up-to-56-with-tso-logic/