Qualifying and Quantifying the Cost of Disaster Recovery (DR) / BCP

The Dollars Behind the 9’s! (First in a Series)

It’s a safe bet to say that most IT professionals at every level are concerned about high availability as it relates to IT systems being accessible to all stakeholders, internal and external. Communicating this outside of IT can be challenging, especially when it comes to budgeting for the cost of Disaster Recovery (DR) and failed or inadequate business continuity planning (BCP).

The CIO, or technology leader strives to reach five nines of availability. The five 9’s stand for 99.999% availability over 24 / 7,365. The .001% equates to 5.26 minutes of unscheduled downtime a year, or 25.9 seconds a month.

While less than thirty seconds a month on the surface may seem inconsequential, the hidden costs behind it are not. Demonstrating the cost of not doing due diligence in not only the Disaster Recovery Side (it’s not just about IT!), but focusing on the Business Continuity Planning side, can greatly help in:

creating a unified understanding of DR / BCP;

obtaining budget for resilient IT systems;

demonstrating that it is not only an IT problem; and

raising awareness and appreciation for the complexity of IT.

A 2013 Study on Data Center Downtime by Jason Verge, put the average cost at $7900 per minute, up 41% from 2010. The study also revealed that even more significant costs are incurred by companies with revenue models that directly depend on the ability to deliver IT and networking services to customers. The highest cost of a systems failure in the study was more than $1.7 million.

Unfortunately, no company is average. The $7900 per minute price tag may not be relevant to your organization. There are excellent ways to present downtime costs to senior management especially prior to submitting your technology budge for Blade Servers, Virtualization, or a redundant data center in event of a disaster.

The following factors can be applied to your own organization, using costs and numbers from outside departments to provide a downtime cost per minute.

1. Employee Cost –Direct, Incidental, and RecoveryFactor the total hourly cost for employees who would be impacted (by location and or application). If your organization has 100 employees and the average cost per hour (include benefits) is $80 per hour, and you have an hour of downtime, the initial reaction is the cost for the downtime is $8,000. This is the direct cost.Incidental cost is the idle time the employee is not working due to an incident (again, it doesn’t have to be a technology failure). This cost may be difficult to validate with Senior Management, as the response I have found is, “They can do something else”. The truth lies somewhere in the middle. By offering a factor of 50% effectiveness, you are dealing with an Incidental incremental time of $4,000.Lastly, is the Employee Recovery time. This the time required to catch up after the incident (email, voicemail, deadlines, etc.). Recovery time can and usually requires overtime. Again, this cost may be difficult to justify and using a percentage of direct cost (75%) may be more acceptable.

So far, this brings the total cost of the one hour outage during normal business to $18,000. This is a conservative number, and should be communicated as such. A more detailed breakdown of “employee true costs” written by Diane Gilson, of the Sleeter Group can be found here.

2. Loss of Business CostsI recently completed a DR/BCP implementation for a Medical company. In my presentation on the Business Impact Analysis (It’s not just IT, remember?), the ARPM, or Average Revenue Per Minute was $3,500.

On the surface, this seems like an easy calculation, Gross Revenue over total minutes in a work year (2080 hours x 60). Unfortunately, very few companies, and employees shut down at 5 pm. Web sales and marketing, email, voicemail, and Social Media never stop, making this calculation difficult.

This is almost impossible to quantify with a direct cost, unless it can be related to a similar incident. Your Marketing department and then Finance may be a help in putting together an estimate.

4. Technology Recovery

This cost is easier to show AFTER an incident, and should be estimated. Typical technology recovery costs include overtime, out-of-warranty acquisition costs, outside vendor and consulting costs. This is any and all costs associated with a system restoration.

One of the Data Centers I was responsible for experienced a power outage. Normally this would have even been a distant memory, if not for the simple fact that Facilities did not top off the generator after weekly tests. In rushing to refuel, air bubbles formed in the Diesel fuel line cause the generator to die. This occurred off hours, of course. When the UPS system failed, everything crashed. The technology recovery cost included overtime /comp time for the IT staff, and about 20% of drives needed to be replaced in the two storage area networks (SANS). Also included was the emergency fuel and service costs for the generator.

Quantifying the cost of disaster recovery (DR) and failed or inadequate BCP to the entire organization for not only an IT systems failure, but the inability to continue business usually helps to garner support and unification between the business units and IT. And those of us in IT can always use positive press and support.