Disaster Recovery: 7 Common Mistakes

Mistakes are how we grow, in business as in life. If a marketing campaign fails, or a prototype has some unanticipated bugs, we can study it, learn what we did wrong, and do better next time.

But disaster recovery is one area where you can’t afford to make mistakes. If poor planning sabotages your DR program when you need it, there’s no redo. At the very least, you’ll lose a great deal of revenue and productivity as a result of unanticipated downtime. However, you could also suffer permanent data loss, lawsuits, compliance actions and permanent damage to your business.

7 Common Disaster Recovery Planning Mistakes

Failing to Adequately Vet Your Disaster Recovery ProviderFor system performance, SLAs are meaningful. If your provider’s performance isn’t up to snuff, the penalties will help defray the cost in productivity, and you’ll be able to move to another provider.

But for business continuity and disaster recovery, SLAs alone mean almost nothing. Your provider may have great guarantees on paper, but if they fail to deliver at the critical moment, the penalty won’t matter — a little fine isn’t going to make up for long delays or permanent data loss.

You need to carefully vet your disaster recovery provider. Ask where your data is stored, how often it is backed up, how companies verify successful backup and how the data is protected (including the security of their data center footprint). Perhaps most importantly, you need to find out what will actually happen in a disaster.

Many traditional and cloud disaster recovery providers just route you to a helpline, and may not even have 24-hour emergency staff available. This is unacceptable. Look for a DR provider offering around-the-clock access to an experienced team of permanent staff who know how to work together when the worst happens.

Insufficient Disaster Recovery TestingDisaster recovery test exercises are central to ensuring successful DR failover. Yet many companies treat testing as a minor precaution or a box to check. They assume the provider’s internal controls will ensure the technical end works as intended, and they’ll be able to just follow the instructions when disaster strikes.

There are two major problems with this. On the technical end, all kinds of things can go wrong. There can be unknown dependencies that prevent mission critical systems from successfully failing over. There can be undetected hardware faults or software configuration issues. But the biggest problems are often in the failover process itself. When push comes to shove, people may not even know how to get production running again, much less how to get everyone back to work and the business running.

Regular testing will ensure your workers know how to turn a disaster recovery plan to action in a worst-case scenario. There will be less panic and confusion, and the technology will be properly configured to work like it’s supposed to.

You and your provider should test everything at least once every year, as well as any time you make a major change to the system. Testing should be done against a set of performance goals, and the results should be carefully evaluated. Go over the test and, if necessary, iterate to improve your disaster recovery plan. If a system fails to come online or a process takes too long, your team may need to make some tweaks and perform another test.

Failing to Understand the Difference Between RTO, RPO and MTDRecovery Time Objective and Recovery Point Objective — RTO and RPO are the two most cited disaster recovery parameters. RTO is the amount of time from the disaster until the system is back online and ready to use, while RPO is the maximum amount of time over which data could be lost since the last time a system was backed up or replicated.

Maximum Tolerable Downtime (MTD) isn’t talked about as much, but in many cases, it’s the most important parameter. While RTO refers to the point at which the system is up and running, MTD refers to the amount of time until the business as a whole is up and running which accounts for the time it takes for staff to be able to use the systems to perform regular operations.

You see, just getting production online doesn’t necessarily mean you can get back to work. Your workers might not have a site to work at, you may not be able to connect to your customers or satellite offices, you may have non-mission critical systems that aren’t part of your disaster recovery failover, but still need to be rebuilt.

Fail to understand RTO, RPO and MTD and your disaster recovery plan probably won’t fit your needs — with results that can range from costly to tragic. If a hospital, a law enforcement organization, a vital transportation hub or another piece of critical infrastructure gets RTO confused with MTD, it could actuallybe a matter of life or death. Even opting for too long an RPO can dramatically increase the cost of the disaster — both financially, and in human terms.

It’s up to you to find a Cloud DR provider who can help you evaluate your business as a whole, and come up with disaster recovery parameters that meet your needs. By considering the costs and consequences of downtime in various systems, you can come up with a plan that controls costs while minimizing risks.

Failing to Include All Critical Systems in Your Disaster Recovery PlanUsually, all critical systems are included when the disaster recovery plan is set up. When they’re not, thorough testing can most often spot the mistake. In the long run, however, it can get a lot harder to coordinate your DR system with production.

Organizations are dynamic — they change and adapt, adding new systems and technology as they go. This can lead to critical pieces of your infrastructure being left out of the DR plan. Change your cloud and IT infrastructure by migrating a core system to the cloud, adding new hardware or upgrading a legacy system, and your backup may not support all critical systems anymore.

Unfortunately, too many businesses subscribe to a “set it and forget it” mentality, and don’t incorporate the DR plan into their change control process (or don’t have a good change control process.) It’s an easy mistake to make if you don’t have the right controls in place — when you’re upgrading to a new ERP landscape, you’re more likely to think about things like strategic planning, not what to do if a flood/power outage/earthquake strikes.

Lower-level operations can be particularly challenging to track and mirror in disaster recovery planning. You’ll probably think to update your disaster recovery if your whole company migrates to SAP Business Suite on HANA or s/4HANA, but will you notice if one department adds a new app to increase productivity? What happens if you acquire a company with its own DR practices, or merge two offices? What if you hire consultants to tweak your SAP basis environment?

All of these scenarios can pose complex change-control challenges to disaster recovery. Many companies just don’t have the governance skills to keep track of all of their changes, register when it’s time to tweak the disaster recovery plan and follow through with testing.

If you have trouble passing audits, or often have initiatives that peter out along the way, it may be a sign that you fit into this group. You may get the best results from working with a partner with multiple competencies, such as regulatory compliance and IT project management in addition to disaster recovery

Not Including a Communication Plan in Your Disaster Recovery PlanOrganizations often conceive of business continuity and disaster recovery as the process of bringing your landscape online. In-depth planning and testing often only encompasses the early stages, from disaster declaration through failover. Within that domain, many companies restrict their testing to IT infrastructure alone.

But true business continuity needs to include communicating with users to get them being productive again — if the systems are online but your team can’t use them productively, what’s the point? You need to plan how to communicate with users in all departments and at all levels of your organization, and make sure you’re prepared to resolve any technical issues once the systems are online.

For example, your users may have to use different tools or web addresses to connect to disaster recovery systems. Your plan will need to include the exact steps they need to take to get online, and how you’re going to help them do it. You’ll need a method to get the information to users and a support mechanism to help those who run into technical difficulties.

Make sure you have a really good support team lined up for disaster mode operation. You know how confused your users get when you roll out a new tool or tweak a business process? Imagine how much more confused they’ll be going into work at an alternate site in a state of emergency. This is another benefit of partnering with a DR provider offering high-touch ERP services — you’ll be able to lean on your partner to stay calm and get your users sorted out if things don’t go as planned.

But more common disasters are far less sensational in nature. A recent Ponemon Institute study showed that failing Uninterruptible Power Supplies (UPS) are the leading cause of disasters, accounting for one in four. Human error accounted for 22%, but that number doesn’t quite do it justice, since it contributes to other types of disasters.

For example, 22% of disasters are the result of cyber crimes, but perhaps 95% of those have human error as a contributing factor. Similarly, many UPS and other infrastructure failures result from human error. And, human error can also exacerbate disasters caused by other factors, for example by delaying recovery. All in all, it’s probably a factor in a significant majority of disasters.

By contrast, weather-related disasters accounted for a mere 10% — not insignificant, but by no means a leading cause of outages.

Your disaster recovery plan needs to be based around risk analysis and mitigation, not Hollywood portrayals. That doesn’t mean you shouldn’t be prepared for natural disasters, but you should keep them in perspective, and invest in proportion to their actual risk.

Additionally, you’ll needs to account for factors outside the domain of traditional DR and HA, such as cyber security vulnerabilities. A cyber attack can be every bit as damaging to your business as a natural disaster, and in many ways it’s much more difficult to prepare for. Without a cyber security services team ensuring your landscape is secure and your DR provider hasn’t been compromised, you could end up with no clean copy to restore.

Not Understanding How to Budget for Adequate Disaster RecoveryDisaster recovery traditionally relied on tape backups. Organizations would typically run a backup at night, and either store tapes onsite, or ship them to a cold site for storage.

This system was affordable and fairly simple to implement, but that’s about all it had going for it. In a real disaster, sourcing and installing servers and infrastructure on the remote site would be incredibly costly and time-consuming, and getting everything back online could take weeks.

Things have changed dramatically in the last few years. Cloud-based Disaster Recovery as a Service (DRaaS) dramatically increases ROI on DR. Basic cloud disaster recovery can have your system up in days for less than the cost of a tape backup that takes weeks to restore. For high performance needs, cloud DR can have RPO, RTO and MTD approaching the level of onsite HA, at a lower price point and with the security of offsite backup.

Disaster Recovery That Works When You Need It

When it comes to disaster recovery, there’s no margin of error. You need a provider with the technology, know-how and diligence to execute a flawless failover under the toughest conditions. Symmetry works with Zerto’s Virtual Replication software to provide Enterprise Disaster Recovery that covers your whole organization. From a simple, single-vendor SAP cloud to the most complex hybrid environments, we can make sure you’re always prepared for anything the world can throw at your company.

As the Senior Cloud Architect at Symmetry, Randy brings over 14 years of experience in Information Technology with focus in Virtualization, Public, Private and Hybrid Clouds, System Design and Implementation, Data Center Operations, and Desktop/Server Engineering. He also has extensive experience with VMware solutions at the enterprise level across multiple industries including Managed Hosting, Cloud Service Providers, Global Utilities, and Healthcare.