Amazon Cloud Outage Aftermath: Questions, Concerns Linger

Amazon's cloud outage shook the cloud computing industry to its core last month, and at Interop Las Vegas 2011 a panel of cloud experts examined the fallout and the lasting impact of Amazon's cloud outage.

Amazon's cloud services went down for several hours, and in some cases days, on April 21 after an issue with its Elastic Block Store (EBS) service got stuck in a "re-mirroring storm" in its North Virginia data center. The hiccup knocked several Amazon cloud users offline, some for more than 24 hours. More than a week after the initial downtime, Amazon apologized for the cloud outage and offered users a cloud credit.

The Amazon Cloud Outage raised questions around the reliability of the cloud and whether it can be trusted for mission critical applications and data. The outage also highlighted the need for transparency from cloud vendors, as Amazon took heat for its lack of communication during and after the cloud outage. Additionally, the Amazon cloud outage brought to the forefront the need for cloud users to examine their contracts, pay close attention to SLAs and learn about what happens during an outage, all before signing on the dotted line.

But according to Simon Crosby, CTO of the data center and cloud division for Citrix Systems, while Amazon's cloud outage was impactful, it was not an illustration that the cloud is an unreliable IT deliver model.

"It's like an airliner crash," Crosby said. "It's bad news because a lot of people get hit. But you're generally safer in a plane than when you're driving to work."

Randy Roland, Terremark senior vice president of product development, said Amazon's outage shined a light on the need to build redundancy into cloud infrastructure. But some of the onus falls to customers, many of whom are uneducated on how cloud environments actually work.

"There are different approaches and there are people that are misusing cloud providers," he said. "There's a problem with cloud that it's somehow magic."

Roland said that while cloud is often cheaper, more agile and more reliable than on-premise systems, cloud computing isn't an invitation to throw IT best practices to the wind. Cloud users must still look under the hood and examine what they're getting from their cloud provider. Due diligence is a must.

"I hope this is a wakeup call," he said. "I think there is this disconnect."

Amazon's outage also pointed out that users can leverage two different clouds, and if there is an issue with one, workloads can be moved to the other. And that model won't incur any unnecessary costs, as cloud computing is pay-per-use.

"The beauty of cloud is when you're not using it you're not paying for it," he said, adding that more compute power can be bought on the fly to handle higher traffic.

Meanwhile, Roland added, "when you build internally, you have to build to peak."

The cloud panel also chided Amazon's slowness to respond to customer concerns around the outage and its lack of communication around its cause and impact in the days that followed.

Andy Schroepfer, vice president of enterprise strategy at Rackspace, said the lack of communication from Amazon draws attention to the need for a personal relationship between cloud vendors and users. It's also necessary to work out SLAs and other guarantees up front, before there is an outage. Schroepfer said that when cloud capacity is bought with a credit card and an e-mail address, users can't always expect rapid response when there's a problem.

Schroepfer said it's necessary to focus on the relationship and establishing trust.

"If there's an outage and you don't know who to call, you didn't have trust in the first place," he said.