Last year s terrorist attacks in the US have

Transcription

1 Will your keep running if disaster strikes? Following the eight steps of the ning cycle can help you be prepared. Wing Lam Ensuring Business Continuity /02/$ IEEE Last year s terrorist attacks in the US have forced many organizations to critically reevaluate the adequacy of their existing s and disaster arrangements. The tragedy highlighted how important it is for organizations to remain commercially operational under even the most exceptional circumstances. E-, which relies heavily on IT, is particularly vulnerable, because IT failures directly limit the capability to generate revenue. The thoroughgoing approach to ning (BCP) that I present here called the BCP cycle can help you avoid those pitfalls. The BCP cycle is generic enough to have practical value in a wide range of IT-related organizations, and it is process-oriented, ensuring well-guided BCP efforts and tangible results. BUSINESS CONTINUITY PLANNING CYCLE BCP is a cyclical process; an organization should review its whenever it introduces changes to the or alters its priorities. I see the BCP process as a cycle of eight core steps, as depicted in Figure 1. Figure 1 (next page) shows two concentric rings. The inner ring describes the core BCP process. Inseparable from BCP is the concept of ning (BRP). Even when an organization can ensure, typically with backup resources, at some point it must also recover its previous, fully functional state. The outer ring depicts the BRP process.as an organization works through each core BCP step, it must, at the same time, address BRP. Central to the BCP cycle is the policy, which defines the organization s holistic approach to. The key areas covered in a good policy include contact points who to contact during office hours, outside office hours, and in an emergency; roles and responsibilities a well-defined organizational structure for the and teams; risk levels a categorization of risks and the level of risk the organization deems acceptable; and service levels how much time is acceptable for responding to threats, implementing s, and recovering from failure scenarios; reviews how and when the organization reviews s; processes processes and procedures that inform staff how to react to and handle particular failure scenarios; incident reporting and documentation methods of recording and documenting incidents and responses to them; testing acceptance criteria and testing requirements for the ; and training training requirements for staff involved in and disaster processes. An organization can gradually compile its policy as it works through the BCP cycle s eight core steps. However, the policy May June 2002 IT Pro 19

2 CONTINUITY PLANNING Test Define processes Review 8. Review 7. Test 6. Define process Figure 1. BCP cycle. 5. Design Design Business ning Business ning Business policy 4. Establish should remain a living document that you maintain during each cycle. EIGHT STEPS Now let s look at the BCP cycle s eight core steps in more detail. Step 1: Initiate the BCP project 1. Initiate BCP Project Establish team 2. Identify threats 3. Conduct risk analysis 1.1 Obtain and confirm support from senior management. 1.2 Identify key and technical stakeholders. 1.3 Form a working group. 1.4 Define objectives and constraints. 1.5 Establish strategic milestones and draw up a road map. 1.6 Begin a draft version of policy. It goes without saying that senior management support and buy-in is essential before starting the BCP effort. To kick off the project, establish a working group and give it specific objectives to work toward defining the strategic milestones for BCP, for example. Empower the group by including key and technical stakeholders who have the decision-making authority to make it happen. Consult within the group and with senior management on strategic milestones and constraints. (For example, when do you need a full? Can it be phased in? Are there any specific regulatory requirements? What is the approximate budget?) Draw a road map to guide the organization. Get the working group to start an early draft of the policy; even though the document will be incomplete, it will help steer the group toward the issues that need attention. Step 2: Identify threats 2.1 Identify the community of and technical stakeholders. 2.2 Conduct threat identification workshops. 2.3 Delineate and document threats. IT-based organizations generally rely on three types of resources technology, information, and people. Consider what threats can render these resources either unavailable or inaccessible. Technology threats include natural disaster (such as flooding), fire, power failure, systems and network failure, systems and network flooding (when attackers try to over whelm a network with traffic), virus attack, denial-of-service attack, theft, vandalism, and sabotage. Information threats come from hacking, theft, fraud, fabrication, alteration, misuse, natural disaster, fire, and the degradation of the ink on paper records. People threats include illness, recruitment shortfalls, resignation, compassionate leave, pregnancy, weather, and unavailability of transportation or office access. In particular, use workshops to tease out the nonobvious threats and those specific to your particular industry sector. For example, the financial services and banking sectors work under strict regulations concerning financial auditing and the retention of audit logs. Delineate and document the threats. Step 3: Conduct a risk analysis 3.1 Conduct risk analysis workshops. 3.2 Assess the likelihood and impact of threat occurrence. 3.3 Categorize and prioritize threats according to risk level. 2.4 Review outputs of risk analysis with management. 3.4 Ascertain level of risk acceptable to the organization. 3.5 Document outputs in policy. Risk is a factor that considers the likelihood of a threat actually occurring and its consequences in terms of finan- 20 IT Pro May June 2002

3 cial loss, loss of customer confidence and partnerships, and damage to reputation. A threat s likelihood of occurring can be classified as either almost certain, likely, moderate, unlikely, or rare. Consequences can be catastrophic, major, moderate, or minor. Most organizations find it adequate to categorize and prioritize risks as high, medium, or low.your high risks are threats that are almost certain or likely to occur and would result in catastrophic or major consequences. (Historically, terrorist attacks have been considered low risk, because their likelihood was considered rare even though their consequences could be catastrophic. Given recent events, however, upgrading their risk rating is reasonable.) Seek out the level of risk acceptable to your organization which threats are you prepared to ignore because either their consequences or their likelihood are too low or insignificant? Step 4: Establish the team 4.1 Identify key, technical, and customer services stakeholders. 4.2 Form and empower the team. 4.3 Clarify and agree on team objectives and working mode. 4.4 Define roles and responsibilities; produce a work. 4.5 Identify incident engagement and response processes. 4.6 Update policy. When an incident does occur, a team must be ready to engage and manage the incident and to enact whatever s are in place. Form the team and establish clear objectives. Define roles and responsibilities, and assign roles to specific individuals. Ideally, compose the team of individuals who hold existing roles of responsibility they will be most familiar with existing and IT practices. Several roles are typical of teams. A manager is the first point of contact, manages the incident, initiates the, mobilizes the team, and presents key decisions to owners when appropriate. The owner makes key decisions about how the handles incidents. The technical services manager manages disruptions to technical services, such as IT infrastructure and applications; initiates arrangements; and interacts with third-party service providers. Consider which threats to ignore because either their consequences or likelihood are too low or insignificant. An estate manager manages disruptions relating to buildings, offices, and the surrounding environment; initiates arrangements and interacts with thirdparty service providers. The operations and customer services manager manages disruptions to operations and customer services; keeps customers informed if there is a noticeable impact on customer service levels; initiates arrangements; and interacts with third-party service providers. Business (or resumption) teams are technical, estate, or customer services teams that execute the s. A manager guides the to normal operations. With a well-formed team, the working group can take more of a steering role. Get the team to think about the processes it must follow to engage, respond to, and manage incidents. The team should also decide how it will coordinate activities between team members. Step 5: Design the 5.1 Identify critical and noncritical services. 5.2 Establish preferred service levels and profiles for and. 5.3 Reaffirm key constraints (such as time and cost). 5.4 For each threat, identify possible strategies and evaluate them in terms of time, cost, and benefits. 5.5 Identify and engage potential partners. 5.6 Draft a set of s and work toward an agreed set of s with senior management. 5.7 Produce and execute an implementation. If you haven t already considered this, now is the time to determine your desired service levels. Two key metrics are resumption response time, the time taken before your organization can continue with after an incident or failure scenario; and time, the time taken for an organization to fully recover its original state after an incident or failure scenario. You can sensibly apply these metrics at the services level. For example, let s say services 1 and 2 are critical; they therefore need shorter May June 2002 IT Pro 21

4 CONTINUITY PLANNING Table 1. Typical to worst-case scenario analysis. Resource Business Business type threat Failure scenario strategy Evaluation Technology Systems Typical: Failure affects A1: Have a third-party Cons: Suffer consequences failure some servers; repair maintenance and support while servers are not in operation. time is short (hours agreement. or 1 to 2 days). A2: Have an emergency Pros: Faster repair time. third-party support Cons: Suffer consequences agreement with guaranteed while servers are not in operation. on-site response time. A3: Have redundant Pros: Minimal server down time. servers on cold, warm, or hot standby. Cons: Purchase and maintenance costs for additional hardware and software. A4: Combine options A1 to A3. Worst case: Failure B1: Have a secondary or Cons: Expensive, requires alternative affects all servers, and disaster site for site and replication of infrastructure repair time is lengthy redirected Internet traffic. and environments. (many days or weeks). Information Hacking Typical: Attackers C1: Have support Pros: Relatively cheap. compromise a server, arrangement to conduct disrupting or system cleansing; restart terminating applications or restore application and and processes. processes on the server. C2: Have redundant Pros: Minimizes disruption. servers on cold, warm, or hot standby. Cons: More expensive, requires purchase and maintenance of additional hardware and software. Worst case: Attackers D1: Restore information Cons: Suffer consequences compromise a server, from the last database during restoration. removing or altering backup. sensitive information. D2: Write all data to a Pros: Lost or altered information can second database; restore be quickly restored. from that database. Cons: More costly, requires additional data source management. D3: Restore information Cons: Can be tedious and from audit trails. time-consuming. D4: Combine D1 to D3. resumption response and times than noncritical services 3 and 4. Continuity and profiles indicate how soon a particular must resume and recover certain services; in short, these profiles define the requirements that the must meet. To identify failure scenarios and possible strategies for each type of threat, you can use a typical and worst-case scenario analysis. Table 1 shows this kind of analysis for an e- system 22 IT Pro May June 2002

5 Use scenario analysis to help understand the relative pros and cons of individual strategies. Common strategies are available for each of the three resource types. Technology. Redundancy (of hardware and network, for example), maintenance and support agreements, and backup and restore capabilities are common defensive strategies. Information. Recover information by using data mirroring, backup and restore, auditing, and off-site or secondary data storage. People. To temporarily shore up people-related resources, use contract staff, rotas (workloads that a company can change in response to demand or personnel shortfalls), call-out arrangements (having certain staff in standby mode to be called to work as necessary), rental offices and sites, manual procedures, and service-forwarding agreements (such as with specialist call centers). The choice of strategies often affects a system s overall design a strong argument for considering BCP as an integral part of your IT development process. In evaluating individual strategies, include the following criteria: Business must consider a disruption s effect on three types of resources technology, information, and people. 6.2 Review and verify processes with relevant stakeholders. 6.3 Identify training requirements. 6.4 Develop training exercises, role-playing scripts, and simulation case studies. 6.5 Initiate training and awareness programs. Business processes include handling specific failure events, such as fire and network failures; backup and restoration of systems and data; virus management; incident reporting; problem escalation hierarchies; customer and staff communication; and contact procedures for third-party support providers. Step 6 is about fully documenting these processes to prepare your organization for any kind of incident. Communicate these processes to the relevant parties. Ensure that the team has identified training requirements and has followed up with a training program for relevant personnel. Step 7: Test your costs for acquisition, deployment, testing, training, and associated management overhead; level of protection; resumption response time; and time to implement, including time for acquiring, deploying, and testing the strategy and for conducting relevant and necessary training. Examine the tradeoffs between strategies. Less-expensive strategies typically have greater limitations and often can t handle worst-case scenarios. Explore potential partnerships. For example, several companies specialize in off-site data storage; a growing number provide complete and services.work toward an agreed set of s with senior management. Produce and execute an implementation for putting the and strategies in place. Step 6: Define your processes 6.1 Identify, define, and document processes. 7.1 Define acceptance criteria. 7.2 Formulate the test. 7.3 Identify major testing milestones. 7.4 Devise the testing schedule. 7.5 Execute tests via simulation and rehearsal; document test results. 7.6 Assess overall effectiveness of ; pinpoint areas of weakness and improvement. 7.7 Iterate tests until the meets acceptance criteria. 7.8 Check, complete, and distribute policy. There are at least four important reasons for testing your. You want to validate the s effectiveness in meeting your stated service levels; identify, at an early stage, any shortcomings in the ; assess whether your service levels are realistic and achievable given your budgetary and time constraints; and give senior management and other parties (such as regulatory bodies) confidence in the. May June 2002 IT Pro 23

6 CONTINUITY PLANNING Table 2. Common pitfalls in the ning process. Pitfalls Plans can be... Incomplete Inadequate Impractical Overkill Uncommunicated Lacking a defined process Untested Uncoordinated Out of date Lacking in thinking Description The BCP process is not complete. Outputs such as the and policy either do not exist or exist in incomplete form. The and strategies can t deal with the level of risk that the organization deems acceptable. The is not practical or achievable within the organization s constraints (manpower, time, and budget, for example). The is overly elaborate or costly with respect to the overall level of risk that the organization is willing to take. The team has not communicated the to all the right people. Staff both management and technical remains unaware of issues. Business processes remain ill defined. Staffers are unsure of how to react in a failure scenario, or they discover too late that their existing processes fall short. The organization hasn t tested its, or hasn t tested it thoroughly enough to provide a high level of confidence in its soundness. The effort lacks organization and coordination. The organization has either not established a team, or the team lacks individuals who can effectively drive the effort to completion. The hasn t been reviewed or revised in light of changes in the organization, its, or technology. The organization doesn t adequately address how it intends to recover to a fully operational state after executing its s. Formulate a test that identifies individual tests. A typical to worstcase scenario analysis can serve as a starting point in identifying the failure scenarios you should simulate and test. Bear in mind that testing can take considerable time to complete one to two months is not uncommon for a large-scale e- doing this for the first time. When devising the test schedule, consider the disruption and time out that the testing will cause other activities. Review your test results to validate the and pinpoint any shortfalls either in the itself or in its execution. Repeat the testing until your meets acceptance criteria. Step 8: Review your 8.1 Develop a review schedule for different types of review. 8.2 Arrange a review meeting or workshop. 8.3 Update the document. 8.4 Kick off another BCP cycle if necessary. The last core step in the BCP cycle highlights the fact that the organization must review its s whenever any of the following occurs: significant changes to the for example, the launch of new e- operations; changes in priorities; shifts in the legal or regulatory landscape; significant world events (wars or terrorist attacks); changes to the IT budget; physical relocation of IT systems and operations; outsourcing of IT systems and operations; developments in IT infrastructure; and significant changes in the labor market. Reviews take many different forms. For major reviews, independent experts typically from specialist firms can perform a thorough and detailed examination of your s. For less substantial reviews, an internal process might suffice, particularly when participants have prior experience. 24 IT Pro May June 2002

7 If a review suggests that the current needs significant revisions, it s time to kick off another cycle of BCP. COMMON BCP PITFALLS Unfortunately, organizations without a systematic approach to BCP are more likely to end up with s that are either inadequate, incomplete, or impractical. Table 2 describes some of the typical ways these s go awry. MAINTAINING BCP AWARENESS Contrary to the way es often treat it, BCP is not an on/off event. BCP is an ongoing concern that should be a high priority for every organization, but especially those running 24-hour e- operations 365 days a year. You can do several things to keep BCP on the management agenda: Explicitly identify requirements up front.actively manage and track their fulfillment as part of the design and systems design processes. Emphasize as one the core principles in designing e-commerce and IT solutions. Frequently refer to high-profile incidents as reminders of what can happen without a. Create a specific role for a manager within the organization or project structure. Hold awareness workshops. For maximum and lasting impact, get an external expert to facilitate such workshops. Hold regular, must-attend BCP review meetings. Include in formal training for both technical staff and management. Having a good is like having insurance: you hope you don t have to use it, but you reap the rewards when you do. Can your organization afford not to have one? Wing Lam has worked for several large consulting firms on large-scale systems design, project management, and IT strategy. His most recent projects include a financial services Internet portal, an Internet bank, a B2C shopping mall, and a B2B exchange. He is an associate at the Institute of Systems Science, Singapore. Contact him at yahoo.com. How to Reach IT Professional Writers We welcome submissions. For detailed information visit our Web site: Products and Books Send product and book announcements to Letters to the Editor Please provide an address or daytime phone number with your letter. Send letters to Letters, IT Pro, Los Vaqueros Cir., PO Box 3014, Los Alamitos, CA ; fax ; On the Web Visit for information about joining and getting involved with the Society and IT Pro. Magazine Change of Address Send change-of-address requests for magazine subscriptions to Make sure to specify IT Pro. Missing or Damaged Copies If you are missing an issue or received a damaged copy, contact Reprint Permission To obtain permission to reprint an article, contact William Hagen, IEEE Copyrights and Trademarks Manager, at To buy reprints, see For further information on this or any other computing topic, visit our Digital Library at publications/dlib. May June 2002 IT Pro 25

Business Continuity Plan October 2007 Agenda Business continuity plan definition Evolution of the business continuity plan Business continuity plan life cycle FFIEC & Business continuity plan Questions

The Define/Align/Approve Reference Series NEEDS BASED PLANNING FOR IT DISASTER RECOVERY Disaster recovery planning is essential it s also expensive. That s why every step taken and dollar spent must be

This quick reference guide provides an introductory overview of the key principles and issues involved in IT related disaster recovery planning, including needs evaluation, goals, objectives and related

CENTRAL BANK OF KENYA (CBK) PRUDENTIAL GUIDELINE ON BUSINESS CONTINUITY MANAGEMENT (BCM) FOR INSTITUTIONS LICENSED UNDER THE BANKING ACT JANUARY 2008 GUIDELINE ON BUSINESS CONTINUITY GUIDELINE CBK/PG/14

Disaster Recovery Plan The Business Imperatives Table of Contents Disaster Recovery Plan The Business Imperatives... 3 Introduction... 3 A Disaster Recovery Program The Need of the Hour... 3 Approach to

Building a strong business continuity plan Protect your clients and firm with a well-planned business continuity plan A solid business continuity plan (BCP) is about more than simply staying in compliance.

whitepaper Why Should Companies Take a Closer Look at Business Continuity Planning? How Datalink s business continuity and disaster recovery solutions can help organizations lessen the impact of disasters

EMERGENCY PREPAREDNESS PLAN Business Continuity Plan GIS Bankers Insurance Group Powered by DISASTER PREPAREDNESS Implementation Small Business Guide to Business Continuity Planning Surviving a Catastrophic

SCHEDULE 25 Business Continuity 1. Scope 1.1 This schedule covers TfL s requirements in respect of: any circumstance or event which renders, or which TfL considers likely to render, it necessary or desirable

Disaster Recovery Planning Process By Geoffrey H. Wold Part I of III This is the first of a three-part series that describes the planning process related to disaster recovery. Based on the various considerations

Preface Computer systems are the core tool of today s business and are vital to every business from the smallest to giant organizations. Money transactions, customer service are just simple examples. Despite

AN INTRODUCTION TO BUSINESS CONTINUITY PLANNING AND SOLUTIONS FOR IT AND TELECOM DECISION MAKERS Executive Summary Today s businesses rely heavily on voice communication systems and data networks to such

BUSINESS CONTINUITY AND DISASTER RECOVERY The purpose of this Guidance Note The main points it covers To assist participants to understand the disaster recovery and business continuity arrangements they

Company Management System Business Continuity in SIA Document code: Classification: Company Project/Service Year Document No. Version Public INDEX 1. INTRODUCTION... 3 2. SIA S BUSINESS CONTINUITY MANAGEMENT

Business Continuity and Disaster Survival Strategies for the Small and Mid Size Business www.integrit-network.com Business Continuity & Disaster Survival Strategies for the Small & Mid Size Business AGENDA:

5 STEPS TO AN EFFECTIVE BUSINESS CONTINUITY PLAN Introduction The Snowpocalypse of 2015 brought one winter storm after another, paralyzing the eastern half of the United States. It knocked out power for

Emergency Response and Business Continuity Management Policy Owner: John Duffy, Registrar & Secretary Last updated: September 2012 Version: 04 Document control Date Version Author Changes To be populated

Circular No. 033/B/2009-DSB/AMCM (Date: 14/8/2009) Guideline on Business Continuity Management The Monetary Authority of Macao (AMCM), under the powers conferred by Article 9 of the Charter approved by

PROFESSIONALADVANTAGE IT Disaster Recovery...It's Just the Tip of the Business Continuity Iceberg The importance of a holistic approach to Business Continuity and the art of making decisions when everyone's

Business Continuity Planning We believe all organisations recognise the importance of having a Business Continuity Plan, however we understand that it can be difficult to know where to start. That s why

How to write a DISASTER RECOVERY PLAN To print to A4, print at 75%. TABLE OF CONTENTS SUMMARY SUMMARY WHAT IS A DRP AND HOW CAN IT HELP MY COMPANY? CHAPTER PREPARING TO WRITE YOUR DISASTER RECOVERY PLAN

BUSINESS CONTINUITY AND DISASTER RECOVERY The purpose of this Guidance Note The main points it covers To assist participants to understand the disaster recovery and business continuity arrangements they

Disaster Recovery Review FREE Promotional Offer Our Colorado region is offering a FREE Disaster Recovery Review promotional through June 30, 2009! This review is designed to help the small business better

White Paper LIVEVAULT Top 10 Reasons for Using Online Server Backup and Recovery Introduction Backup of vital company information is critical to a company s survival, no matter what size the company. Recent

Emergency Management Business Continuity Template The Regional Municipality of Wood Buffalo would like to give credit to the Calgary Emergency Management Agency (CEMA) and the Calgary Chamber of Commerce

Operational Risk Management Policy Operational Risk Definition A bank, including a development bank, is influenced by the developments of the external environment in which it is called to operate, as well

Business Continuity Planning (BCP) / Disaster Recovery (DR) Introduction Interruptions to business functions can result from major natural disasters such as earthquakes, floods, and fires, or from man-made

How to Design and Implement a Successful Disaster Recovery Plan Feb. 21 ASA Office-Administrative Section is Sponsored by Today s ASAPro Webinar is Brought to You by the How to Ask a Question Questions

BUSINESS CONTINUITY PLANNING GUIDELINES Washington University in St. Louis The purpose of this guide is to serve as a tool to all departments, divisions, and labs across the University in building a Business

ADVISORY Top 10 Reasons for Using Disk-based Online Server Backup and Recovery INTRODUCTION Backup of vital company information is critical to a company s survival, no matter what size the company. Recent

The Office of the Auditor General has conducted a procedural review of the State Data Center (Data Center), a part of the Arizona Strategic Enterprise Technology (ASET) Division within the Arizona Department

AUDITING A BCP PLAN Thomas Bronack Auditing a BCP Plan presentation Page: 1 What are the Objectives of a Good BCP Plan Protect employees Restore critical business processes or functions to minimize the

Managing business risk What senior managers need to know about business continuity bell.ca/businesscontinuity Information and Communications Technology (ICT) has become more vital than ever to the success

NAVIGATING THROUGH A CATASTROPHIC DISASTER: The five most common mistakes in business continuity planning As we continue to send our thoughts and prayers to the Japanese people, many of us are also reflecting

1. What is the most common planned performance duration for a continuity of operations plan (COOP)? A. 30 days B. 60 days C. 90 days D. It depends on the severity of a disaster. 2. What is the business

Disaster Recovery and Business Continuity What Every Executive Needs to Know Bruce Campbell & Sandra Evans Contents Why you need DR and BC What constitutes a Disaster? The difference between disaster recovery

BEST PRACTICE GUIDE TO SMALL BUSINESS PROTECTION: BACKUP YOUR SMALL BUSINESS INFORMATION ENTER YOUR BUSINESS depends on electronic customer lists, confidential information and business records. Protecting

A Custom Technology Adoption Profile Commissioned By EMC Corporation How Organizations Are Improving Business Resiliency With Continuous IT Availability February 2013 Introduction: Business Stakeholders

The 9 Ugliest Mistakes Made with Data Backup and How to Avoid Them If your data is important to your business and you cannot afford to have your operations halted for days even weeks due to data loss or