Thursday, April 21, 2011

EC2 Outage Takes Out Multiple Sites, Including Foursquare

Foursquare and other web sites were affected by problems at Amazon's Elastic Compute Cloud data center. Amazon said additional capacity was added to support EC2's "affected availability zone" in Virginia. The outage is likely to add to the debate about reliability and security when using a vendor data center like Amazon's EC2.

Amazon's cloud Relevant Products/Services-based platform suffered outages Thursday. The company said the problems involved latency and other errors, and it brought down the web sites of Foursquare, Quora, HootSuite, Reddit and other companies.

The problems hit the part of Amazon Elastic Compute Cloud (EC2) that supports start-ups. In a statement, Amazon said it is "now seeing significantly reduced failures and latency," and it continues to recover. It added that additional capacity has been brought online to support Relevant Products/Services "the affected availability zone."

'Amazing Data-Center Hosts'

Foursquare said on its web site Thursday morning that "our usually amazing data Relevant Products/Services-center hosts, Amazon EC2, are having a few hiccups this morning, which affected us and a bunch of other services that use them." The notice added that everything was "looking to be getting back to normal now." EC2 provides pay-as-you-go computing capacity in the cloud.

As of Thursday afternoon, Quora's web site said it was "currently having an unexpected outage," and efforts were under way "to get the site back up as soon as possible."

Foursquare is a location-focused social site that allows users to broadcast their locations to friends and Quora is a user-created question-and-answer site. Reddit is a news aggregator and HootSuite is a popular Twitter client.

The trouble was first reported about 5 a.m., with connectivity issues affecting the Amazon Relational Database Service that covers several areas in the eastern U.S. Problems also developed in EC2, and there were issues with the Elastic Block Store, or EBS, which provides storage Relevant Products/Services for EC2.

'Beginning To Recover'

Amazon said "a networking event early this morning triggered a large amount of remirroring of EBS volumes in US-EAST-1," which led to a shortage of capacity in one of the eastern availability zones, which, in turn, affected EBS volume creation and recovery.

In a posting on the Amazon Web Services' dashboard, the company said that, as of 10:26 a.m. PDT Thursday, it had made "significant progress" in stabilizing the EC2 situation in Northern Virginia.

It added that "additional capacity has been brought online in the affected availability zone," and affected volumes "are beginning to recover." The company added that it cannot yet estimate when these volumes will be completely recovered, but "we will provide an estimate as soon as we have sufficient data to estimate the recovery."

A posting half an hour later said that, in regard to an "estimated time of arrival" or ETA for when the service would be fully recovered, the "high-level ballpark right now is that the ETA is a few hours."

The notice added that "we can assure you that all hands are on deck to recover as quickly as possible."

As cloud-based platforms increase in popularity, questions continue to rise about reliability, security Relevant Products/Services and the wisdom of trusting your company's fate to a remotely located provider, even one as large as Amazon.