More outages hit Amazon's S3 storage service

Amazon's S3 cloud storage service suffered eight hours of downtime and elevated error rates in Europe and the United States on Sunday.

Jon Brodkin
July 22, 2008

Share

Twitter

Facebook

LinkedIn

Google Plus

Amazon's S3 cloud storage service suffered eight hours of downtime and elevated error rates in Europe and the United States on Sunday.

The outage lasted several hours longer than a similar problem that hit the service in February, disrupting websites that rely on the online Simple Storage Service.The social networking site Twitter was disrupted during both outages.

"Today has been a bad day for many websites and start-ups across the Internet," a Geek Zone blogger wrote during this week's outage. "The reason? An Amazon S3 outage. ... One of the most high-profile victims of the current S3 outage is Twitter: Images, such as avatars of users, are currently not being served, because they are all stored on S3."

Amazon described its attempts to fix the problem Sunday on a "service health dashboard" page the company uses to keep the public up to date on the status of its web services. In addition to S3, service interruptions were reported with Amazon's Simple Queue Service (SQS), a tool that helps developers move data among distributed components of applications.

Amazon's online storage service is often used in conjunction with its Elastic Compute Cloud, which gives customers access to processing power via the Web. The Elastic Compute Cloud itself did not suffer any downtime Sunday, but Amazon said the S3 problems prevented registering of new virtual machines on the Compute Cloud, and that some virtual machines could not be launched. Running instances on the Compute Cloud were not affected.

Amazon reported "elevated error rates" with S3 beginning at 9:05am PST (5:30 BST) Sunday, and later described the problem as "an issue with the communication between several Amazon S3 internal components." Amazon reported making "incremental progress" at 1:17pm (9:17pm BST), and then two hours later said "no data has been lost during this incident."

By 3:23pm PST, Amazon said service in Europe had been fully restored but that the United States would take longer because it contains a larger number of storage systems.

At 5:12pm PST, Amazon said service in the United States had been fully restored. "We will provide more detail on this event once we have completed a full investigation," Amazon said.

Amazon has said the February outage was due to elevated numbers of authentication requests, and that in response it has added "significant" amounts of capacity to its authentication service and improved the system that monitors the proportion of requests that are authenticated. Amazon said there was no data loss during that incident, either, because the company stores multiple copies of every object in multiple locations.