Netflix disruption highlights challenges of cloud computing

Netflix headquarters in Los Gatos, California. Problems with Amazon’s cloud computing service caused Netflix to go down for much of Christmas Eve. Photo: AP

For some on Christmas Eve, “White Christmas” was a blackout on Netflix.

That’s because problems with Amazon’s cloud computing service, which provides storage and computing power for all kinds of websites and services, caused Netflix to go down for much of the day.

In updates on a website that reports on the status of its online services, Amazon traced the trouble to Elastic Load Balancing, a part of its service that helps spread heavy traffic among multiple servers to prevent overload. The company gave few details about the problems in its data centre in Northern Virginia beyond this and did not offer an official statement or explanation.

Social networks filled with complaints. Some customers also complained that Amazon’s own streaming service, Amazon Prime, was down. Amazon said it had fixed the problem completely by the afternoon of Christmas Day, and Netflix said it had restored its services to most of the affected consumers by late Christmas Eve. But the episode highlighted how consumers are increasingly using “the cloud”.

As more everyday devices, appliances and even automobiles rely on services connected to the Internet, consumers expect those services to be available at all times. Yet all sorts of disruptions—harsh weather conditions or an apparent overload—can knock a service out for hours.

Last month, problems with the same Amazon data centre in Virginia took down Reddit, Foursquare and Heroku. The instance was explained on the status website as “degraded performance” in some parts of Amazon’s storage service. In June, a lightning storm hit the Virginia data centre, taking Netflix as well as Pinterest, Instagram and other sites offline for hours. That time, too, customers were offered little insight into what had happened.

In April 2011, an Amazon failure took down many smaller sites that had rented cloud storage space from the Internet giant. That time, the companies that were most affected were startups that were less likely to pay for so-called redundancies, or backup systems that kick in when a service fails. Netflix was not affected then, and said at the time it was because it had taken advantage of the redundancies that Amazon offers.

Netflix has said that it has built several redundancies into its cloud-based system. For instance, it stores its data across multiple “zones,” so if there is a failure in one zone, it can retry in another. It says it also spends money on more capacity than it needs, so that if there are large spikes in customer activity, the service is less likely to go down.

Joris Evers, a Netflix spokesman, declined to elaborate on why Netflix went down despite these safeguards. He said the company was investigating the cause and would do what it could to prevent the interruption from recurring.

“We are happy that people opening gifts of Netflix or Netflix-capable devices on Christmas morning could watch TV shows and movies, and apologize for any inconvenience caused Christmas Eve,” Evers said.

Tera Randall, an Amazon spokeswoman, said that the company has been “heads down” to ensure services are running smoothly and that a full summary of the incident would be published in a few days. Amazon is one of the biggest players in online services, hosting data storage and computation for hundreds of companies. Once a sideline Amazon set up six years ago, the cloud service has since exploded into a business that is expected to bring in about $1 billion to the company this year.

Other companies offer similar services, notably Google, which introduced its competitor in June. Microsoft is also in the business with Windows Azure. Although the service disruptions may annoy some companies and their customers, it’s unlikely many businesses will end their partnerships with Amazon in light of this latest Netflix failure, said James McQuivey, an analyst for Forrester Research. He added that it was unlikely that a temporary service failure for Netflix was going to cause many to cancel subscriptions.

He said companies can pay extra to Amazon to add safeguards that increase reliability of their online services, but they typically choose to save costs and take the risk of their services going down temporarily. He said that Amazon has been especially popular among businesses because it has been gradually improving its services and lowering its costs. Businesses “of course, are going to say, ‘Gee, Amazon, what’s going on?”’ McQuivey said. “But in reality they’re all getting such a great deal. I don’t see them getting that upset about it.”