If a disaster or IT outage occurred to your business, RIGHT NOW. How confident are you with your DRP? In today’s network economy, down-time is not an option.

DRP and BCP strategy is governed by the CEO and CIO and they are responsible for the solution to define the business impact , requirements and investment. IT has to provide guidance to develop a BCP and DR Strategy and get buy-in from the business.

BCP vs DR

BCP – Planning to continue your business operations in case of a disaster.

DRP – Planning to recover from disaster situations – How the IT (information technology) should recover in case of a disaster.

Other Important Definitions

Continuous Data Protection – Replication solutions can be either synchronous or asynchronous, meaning transfer of data to a remote copy is achieved either immediately or with a short time delay. Both methods create a secondary copy of data identical to the primary copy, with synchronous solutions achieving this in real time. This means that any data corruption or user file deletion is immediately (or very quickly) replicated to the secondary copy, therefore making it ineffective as a backup method.

Copy-on-write snapshot – Most snapshot implementations use a technique called copy-on-write, which makes an initial snapshot then further updates as data is changed. Restoration to a specific point in time is possible as long as all iterations of the data have been kept. For that reason, snapshots can protect against data corruption, unlike replication.

Clone/split-mirror snapshot – Another common snapshot variant is the split-mirror, where reference pointers are made to the entire contents of a mirrored set of drives, file system or LUN every time a snapshot is made. Clones take longer to create than copy-on-write snapshots because all data is physically copied when the clone is created. There is also the risk of some impact to production performance when the clone is created because the copy process has to access primary data at the same time as the host.

Continuous data protection (CDP) – CDP is a method of snapshotting that tracks and store all updates to data as they occur. Theoretically, this means CDP solutions can roll back to any point in time, down to the smallest granularity of update. But there is a price to pay with CDP in terms of the cost of storage needed to keep every changed block copy and the performance impact of storing the data. As a result, some vendors implement what they call near-CDP, taking snapshots of changed data at set times and consolidating changes over a longer time period. This means heavily updated data doesn’t overwhelm the capacity of the CDP system. In virtual environments, APIs such as vSphere’s VADP enable CDP solutions to be implemented by third-party software vendors.

Clustering and Availability

Fault Tolerant

Highly Available

Metro/GeoClusters

Culstering for Performance / Load Balancing (Scale-out)

Backup – Backup is the process of making a secondary copy of data that can be restored to use if the primary copy becomes lost or unusable. Backups usually comprise a point-in-time copy of primary data taken on a repeated cycle – daily, monthly or weekly.

Archival –Is storing copies of data all version for Long Retention periods, 7 years or more and in Legal Hold requirements for life time of that organisation.

BUSINESS CONTINUITY Regulatory Requirements

ISO – ISO 22301:2012, “Societal security — Business continuity management systems — Requirements”, specifies a management system to manage an organization’s business continuity arrangements. It is formal in style in order to facilitate compliance auditing and certification.

United Kingdom – British Standard BS 25999 was a two-part business continuity management standard. “BS 25999-1:2006 Business Continuity Management. Code of Practice” offered pragmatic implementation guidance, but was withdrawn in 2012 when ISO 22313 effectively superseded it. “BS 25999-2:2007 Specification for Business Continuity Management” formally specified a set of requirements for a business continuity management system. It too was withdrawn in 2012 when it was (in effect) replaced by ISO 22301.North America – Published by the National Fire Protection Association NFPA 1600: Standard on Disaster/Emergency Management and Business Continuity Programs.North America – ASIS/BSI BCM.01:2010 published Dec 2010ANSI/ASIS SPC.1-2009 Organizational Resilience:

The ANSI/ASIS SPC.1-2009 Organizational Resilience: Security, Preparedness, and Continuity Management Systems—Requirements with Guidance for Use American National Standard is under consideration for inclusion in the DHS PS-Prep, a voluntary program designed to enhance national resilience in an all hazards environment by improving private sector preparedness.Australia – Published by Standards Australia HB 292-2006 : A practitioners guide to business continuity management HB 293-2006 : Executive guide to business continuity management In 2010, Standards Australia introduced their Standard AS/NZS 5050 that connects far more closely with traditional risk management practices. This interpretation is designed to be used in conjunction with AS/NZS 31000 covering risk management.

APRA

DR RUN BOOKS ARE THE WRONG WAY

The complexity of maintaining DR Run Books and complex DR Technology and expensive solutions means that most of the time it is a wasted invested that fails. It is better to develop Continuous availability technology solution.

Usually DR RUN books are updated at the time of the DR Test, not during BAU. Changes to the environment will also affect the DR RUN book and Change Management usually neglects to u[date the DR RUN books due to the Human Factor.

Continuous Availability can eliminate the most time consuming error prones areas and maintain DR posture by optimization or even eliminate Stage 11 and 12.

Continuous availability is simply achieved by virtualising the Network layer and Storage Layers. This reduces the overall complexity of DR RUN books and Operation Costs to test and maintain them in a separated environment and makes use of cold or passive datacenters. Initial investment of setting up continuos availability is achieved quickly when you can maintain 100% DR posture compared that to the cost of a maintain DR, isolated testing, exposure and failed compliance How much does it cost for you to maintain DR RUN Books and test, invest all of these soft opex costs can be use building a Continuos availability solution. Continuous Availability will lower the costs and complexity.

In this business case you will need to allocate a investment to this DR design:

Identify Application Tiers and Uptime

You will also need to identify and classify key Applications and place them into Tiers of importance. A good reference for this classification is to use the uptime institute as a guide. This should also contain uptime requirements for each application Tier.

WORKLOAD CATAGORIES

BUSINESS CRITICAL – Your high priority workloads with prioritised failover

COMPLIANCE – Your compliant workloads that must meet regulations

GENERAL PURPOSE – Your non-critical workloads with a restart

BUSINESS IMPACT ANALISYS

…………………..

Define Budget

The following formula can be used to highlight the revenue lost due to a outage:

Lost of Revenue due to outage = $Revenue / 365 Days * (RTO + RPO)

The business case will always to to identify trade-offs, between, Price, Performance, Cost. You might not be able to achieve all of them and its you need to be realistic.

Define Success Criteria

Developing a DR Solution, I would class as a Spaghetti problem and requires a method to solve this type of problem.

In ‘spaghetti situations’ in which everything is connected to everything, and everything influences everything it is by far not obvious what the best solution is. All people involved have a different idea about what the problem is. And if you ask them, all these people have different ideas about what the solution could;. If you, as engineer, consultant, manager or analyst are in a situation like this, then what to do?

In order to solve complex problems its required to define a overall strategy and method that will guide the development of the complex solutions and following a set critical path, budget and timeframe.

In order to design and accelerate the implementation of a solution, customers must commit to design decision and any acceptable risks. Multiply components solutions adds to the complexity, through validation and support from vendors is a absolute requirement and they must be invested in the success of the solution

Define Plan

Planning your availability transformation

Analyzing your current state

Assessing your continuous availability readiness

Identifying infrastructure requirements

Designing the solution architecture

Performing a cost/benefit analysis

DR Design Options

I wanted to explore a number of options for DR Design

Active/Active

Active/Hot DR

Active/Warm Passive DR (Standby)

Active/Cold DR Recovery from Disk

DR to Cloud

Cloud Only

Disaster Recovery vs Disaster Avoidance

Disaster Recovery Technology Options

Network

Standard IP LAN

Load Balancer

WAN Optimisation/QoS

Storage

Synchronous / Asynchronous

SnapMirror

SRDF

FlashCopy

Application

SQL Replication/Mirror

OS

Veritas

Microsoft Always On Clustering

Hypervison

HA

FT

Zerto

EverRun

Manual

SRM

Data protection

Brick and Storage Level Backups

CommVault

Avamar/Datadomain

Veema

Symantec

Tivoli Storage Manager

Compute

IBM SystemP PowerHA

Stratus

HP Service Guard

Switch

Legacy

Continuous Availability and Disaster Avoidance technology options

Network Virtualisation

Stretch vLAN – OVT / VPLS – Virtualization (OTV) can be used for L2 extension between the customer’s data center and the cloud. L2 connectivity allows customers to use the same IP from enterprise network in the cloud without the need to change for accessing workloads in the cloud after recovery.Storage Virtualisation