A few years ago, the city of Los Angeles experienced a heat wave that taxed power supplies and pushed air conditioning systems to the limit. As temperatures continued to rise in a data center that housed critical servers, the city faced a difficult decision.

“The only choice seemed to be to risk damaging the servers or shut them down and take our key applications offline,” said Joyce Edson, deputy CIO at the city of Los Angeles Information Technology Agency (ITA).

Fortunately, the department had recently been given permission to contract for cloud services. Edson and her team quickly spun up resources on Amazon Web Services (AWS) and executed a failover that enabled applications to run, while on-premises servers could shut down.

Currently, Los Angeles has multiple infrastructures set up for a range of situations, said Syed Towhid, a programmer and analyst with the city. “We have a data center on premises and a data center in the cloud,” he said. “Anytime we have a problem on premises, we have the ability to perform a failover to the cloud,” he said.

Build a cloud-based disaster recovery plan

Initially, many businesses only put low-priority workloads in the cloud, but that has changed. And with more mission-critical workloads hosted off premises than ever before, a strong cloud-based disaster recovery plan is a must.

Paper-based tests are fine … but true, hands-on execution is the only way to know that everything is working as it should.Chris Pasternakmanaging director, Accenture

“Many organizations have placed their critical workloads into the cloud, so [disaster recovery] planning and cloud disruptions are much more important,” said Tom McAndrew, COO at Coalfire, a cybersecurity consultancy located in Westminster, Colo.

There are three general options enterprises face when they start to build a cloud-based disaster recovery plan, according to McAndrew:

Single-cloud architecture: With this model, enterprises can still use multiple locations. For example, with AWS, companies often share their workloads among different regions, such as U.S. East, U.S. West and GovCloud.

Multiple clouds: In a multicloud architecture, enterprises share workloads between multiple pubic cloud providers, such as AWS and Azure. This architecture helps minimize single points of failure.

Hybrid cloud: This approach combines on-premises systems with a public cloud platform, and enterprises share workloads between the two.

Before they choose a model, enterprises should evaluate the expertise they have, and the applications they currently use. For example, an IT shop with Microsoft-savvy engineers might be most comfortable with Azure, McAndrew said.

Furthermore, since cloud providers continuously improve their replication services, organizations should use available cloud-native technologies to get the best results, said Chris Pasternak, managing director at Accenture in Green Bay, Wis.

“We are seeing more cloud providers finding ways to extend the on-premises data center to the cloud,” he said. This comes as more enterprises implement an architecture that seamlessly integrates cloud with the on-premises data center, so that the disaster recovery environment becomes an extension of production operations.

Test a cloud-based disaster recovery plan

In general, organizations can apply the same best practices they use to test on-premises disaster recovery plans to a cloud-based plan. They should test their plan completely and regularly, ideally on an annual basis.

“Paper-based tests are fine for working through the macro level, but true, hands-on execution is the only way to know that everything is working as it should,” Pasternak said.

Many organizations have one or more apps that they are reluctant to fully test because they can’t afford downtime. But, in most cases, cloud affords the capability to execute thorough tests without a full production system outage, he continued.

A cloud-based disaster recovery plan is only as good as its last test — and that is a best-case scenario, said Naveen Chhabra, senior analyst at Forrester Research.

“The worst case is that, if you just changed something an hour ago, it could cause a bottleneck in recovery,” Chhabra said. “I have interviewed clients where it is an annual ritual, but in a rapidly changing environment, that is too infrequent.”

Instead, he recommends to test at least once a quarter — a practice that’s now common for banks and telcos. “You don’t want surprises,” Chhabra said. “You want to plan your failover and test it out because success is not about tooling, it is about fundamentals.”

Quarterly tests are also now a best practice at the city of Los Angeles ITA, where Edson said her team learned from experience that this is essential.