Latest outage raises more questions about Amazon cloud

Massive thunderstorms notwithstanding, the fact that Amazon’s U.S. East data center went down again Friday night while other cloud services hosted in the same area kept running raises anew questions about whether Amazon is suffering architectural glitches that go beyond acts of God. While most Amazon services were back up Saturday morning, the company was still working on provisioning the backlog for its ELB load balancers as of 5:31 p.m. eastern time, according to the AWS dashboard.

This outage — the second this month — took down Netflix , Instagram, Pinterest, and Heroku, as Om previously reported. The storm was undoubtedly huge, leaving 1.3 million in the Washington D.C. area without power as of Saturday afternoon, but Joyent, an Amazon rival, also hosts cloud services from an Ashburn, Virg. data center and experienced no outage, something its marketing people were quick to point out.

The implication is that Amazon, with all its talk about redundancy and availability, shouldn’t be having these issues if others are not.

Steve Zivanic, VP of marketing for Nirvanix, another rival cloud provider, said customers should simply stop defaulting to Amazon’s cloud. “It’s becoming rather clear that the answer for [Amazon’s] customers is not to try to master the AWS cloud and learn how to leverage multiple availability zones in an attempt to avoid the next outage but rather to look into a multi-vendor cloud strategy to ensure continuous business operations,” Zivanic said via email. “You can spend days, months and years trying to master AWS or you could simply do what large-scale IT organizations have been doing for decades — rely on more than one vendor.”

Another #AWS outage -> Another #Heroku outage. You'd think #Heroku would be scrambling to architect around that as quickly as possible.

The fact that Amazon, like any other data center-dependent business is not bulletproof also raises questions about why its customers don’t pursue a multi-cloud strategy or, if they’re going to rely solely on Amazon, why they put so much of their workload in one geography — a practice Amazon itself advises against. Of course, it isn’t good practice for any vendor to blame snafus on its customers.

Presumably the tech folks at Instagram, Heroku, et al. know better. Earlier this month, I asked Byron Sebastian, the Salesforce.com VP of platforms who runs the Heroku business, if Heroku was actively seeking other cloud platform partners. He said the company is always evaluating its options.

Twitter was awash in comments. Many wondered why Amazon’s data center did not cut over to generator power while others, like Gartner analyst Lydia Leong preferred to wait to see what part Amazon’s data center operator played in this mess.

@cloudpundit Breakdown: grid power failed twice in June and AWS crashed twice same way. My dc in Ohio lost grid Fri, stays up on gen power.

Reached for comment Saturday afternoon, an Amazon spokeswoman reiterated that the storm caused Amazon to lose primary and backup generator power to an availability zone in its east region overnight and that service had been mostly restored. She said the company would share more details in the coming days.