Amazon Cloud Computing Platform Goes Down, Takes Major Social Websites With It

Amazon’s elastic compute cloud (EC2) is a popular hosting platform for social websites because it provides a cheap, scaleable, generally reliable place for them to live online without having to purchase a floor full of their own servers. But a major EC2 outage today highlights the potential perils of outsourcing one’s own hosting needs onto the cloud: When the cloud goes down, there’s not all that much you can do.

Reddit, Foursquare, Quora, Heroku, and Hootsuite were among the services hit with major downtimes as a result of the EC2 failure, some of which, according to Amazon’s status page, appears to be localized at Amazon’s North Virginia data center. (As of posting, Foursquare and Heroku are back up, and the other three are still down.) While deploying one’s infrastructure across multiple Amazon web services availability zones can sometimes safeguard against local failures, multiple availability zones have been affected; Reddit admin Jeremy Edberg says that Reddit was deployed across three availability zones.

Amazon advertises an uptime of 99.95% for EC2; this means that an outage of four hours and twenty-two minutes would eat up a whole year’s downtime allowance, and this problem has been going on for at least four hours. Amazon does offer service credits to customers whose sites have been down for more than 0.05% of the time, so it’s possible that some of the social sites mentioned will at least be eligible for a discount of up to 10% for their pains.