Thursday, June 14, 2012

Technoracle has published many articles on Cloud Computing, a technology of virtualizing computer functionality. The virtualization occurs when a network or system's physical topology no longer aligns to it's logical topology.

Today an event happened that once again should serve as a reminder that cloud computing might not always be the best solution. At around an hour ago Pacific Daytime Time, reports starting surging in that instances of Neo4J on the cloud were failing. These messages initially were focused on Heroku, a generally very reliable cloud provider. It became quickly apparent that the outages were hitting Amazon Web Services EC2 instances as well as other cloud providers. The messages at YCombinator revealed how groups became aware of these outages:

reply
michaelfairley 37 minutes ago |
link
The circle is green with a little "note" on it.
"8:50 PM PDT We are investigating degraded performance for some volumes in a single AZ in the us-east-1 region."
reply

DigitalSea 29 minutes ago |
link
Wouldn't that only affect a small subset of visitors. For example why would I be seeing any issues if I'd be hitting an Asia-pacific volume instead of a us-east region one? Seems like it goes deeper than that.
reply
mechanical_fish 4 minutes ago | link
One problem which we've seen before is: If a large percentage of AWS infrastructure goes down, the customers don't just quietly suffer. Instead they scramble to try and launch infrastructure in other zones or regions, which creates a cascading series of load spikes throughout the AWS system.
AWS is a fascinating science experiment. Pity about the websites, though.
-----

michaelfairley 12 minutes ago |
link
It's now yellow with this: "9:27 PM PDT We continue to investigate this issue. We can confirm that there is both impact to volumes and instances in a single AZ in US-EAST-1 Region. We are also experiencing increased error rates and latencies on the EC2 APIs in the US-EAST-1 Region."
AWS has been historically bad at reporting the severity of their outages promptly.
reply

The spread of this from one cloud provider to another in such rapid succession shows the fragility interconnected systems have an how they are susceptible to these types of events. With time, it is hoped that the lessons learned from these types of events will help us all build better systems.

Pin it!

SHARE!

If you are helped by this blog in any way, or just find the contents interesting, please give it a share to help keep me publishing. Shares help drive my advertising revenue which makes me more likely to write articles to help people in the future. Thank you in advance!