Technology: Don’t let the cloud services Grinch steal your Christmas

On Christmas Eve 2012, Netflix—a leading provider of Internet streamed video content—received the electronic equivalent of coal in its online stocking.

Beginning on the afternoon of Dec. 24 and continuing well into Christmas Day, many Netflix customers experienced difficulties in accessing portions of Netflix’s services. Instead of settling by the fire to watch a movie, these unlucky customers were forced into spending more time with their family and friends.

As it happens, Netflix had outsourced certain of its system operations to a cloud computing service maintained by Amazon. The Netflix Christmas Eve outage was attributable to a failure in Amazon’s Northern Virginia data center. Apparently, an Amazon developer had mistakenly deleted data during maintenance operations which resulted in the outage. Ironically, Amazon’s competing Prime Instant Video service experienced no holiday downtime.

Given the growing movement by corporate IT departments to rely on outsourced cloud services, the recent Netflix outage offers lessons in how to ensure that such cloud customers are taking adequate steps to protect themselves in their cloud service contracts. Below are some examples of the approaches that cloud customers may consider using.

Adequate Service Levels

In the Netflix Christmas Eve outage services were down for hours before they resumed. Cloud customers should insist that their cloud vendor’s contract includes robust service levels with financial incentives for ensuring that there are few service interruptions, the duration of any interruptions are short and that service is quickly restored.

Use of Multiple Sites

As noted earlier, published reports relating to the Netflix outage indicated that the cause of the problem occurred at a single data center. Corporate cloud customers should always undertake adequate due diligence to ensure that potential vendors have the capability to perform the cloud services from more than one site or data center. The use of multiple sites will lessen the likelihood of a prolonged outage attributable to a single site failure.

Business Continuity Provisions & Avoidance of Force Majeure Clauses

Related to the above point, cloud customers should demand that their service providers have strong business continuity and disaster recovery plans to ensure that in the event of any service interruption (regardless of the cause), the vendor has back-up sites at which any crucial services can continue to be performed until the vendor is able to restore its primary sites. Customers should also be alert to the vendor’s use of force majeure provisions in their contracts (i.e., provisions that excuse performance in the event of disasters, wars, strikes, etc.). Such clauses should be resisted if possible—but at a minimum, the vendor must first be required to fully comply with approved business continuity and disaster recovery plans before it can excuse performance based on a force majeure event.

Separation of Development, Testing & Production Environments

Published reports indicated that the Netflix outage resulted from a developer introducing changes into the production environment without fully realizing the impact of those changes. Such event illustrates that it may make sense to include provisions in the cloud service contract limiting developer access to production systems and requiring adequate testing of any system changes in a test environment before a change is introduced into a production environment.

Be Prepared for Problems

Clearly, both Amazon and Netflix are companies that are technologically sophisticated. Indeed it is quite likely that Netflix had addressed most (if not all) of these issues in its cloud service contract with Amazon. Even the best systems and back-up plans fail at times. Cloud customers must anticipate that problems will likely occur, but they should have in place reasonable contractual provisions and back-up plans to address those likely failures.

Most importantly, corporate cloud customers should also anticipate how their own customers may be impacted by a cloud vendor’s failure. If appropriate the cloud customer’s contracts with its own customers should attempt to address the impact of any such possible cloud vendor failures. The cloud customer should, if appropriate, promptly communicate the nature and resolution of the outage as well.