Data protection over WANs: Recovery and restoration, Part 3

Posted on February 01, 2006

Note: The first two parts of this article series appeared in the November 2005 and January 2006 issues of InfoStor.

By George Hall

Four attributes need be considered with respect to data recovery over wide area networks: speed, accuracy, security, and convenience.

Several conditions can trigger the need for data recovery, including compliance reporting; audits that include materials not available on local computers; the need to recover lost or damaged files or file sets; and litigation support, which can require the production of vast amounts of archival data.

In this article we will presume that the integrity of the data center’s physical infrastructure is not at issue so that we can focus on the design of digital network-based systems that satisfy policy requirements for iterative MTTR (mean time to repair/recover) for circumstances other than total loss of a data center.

Most people believe that the user interface is not directly related to archival network performance. However, it is, and we will see how by examining user interactions with the archival system. Let’s first focus on the issue of user convenience, an important issue since this is where the interaction with the disaster recovery (DR) and archiving systems should occur.

It is customary in an online system for a recovery/restoration request to be made online and sometimes without specific information available on the data set being requested. For this reason, a network-accessible, Web-based utility for searching and browsing an archive index, and then organizing and downloading archival records in the selected format and without third-party intervention, is important. A system that dynamically updates and makes available (via browser) an index of available archival files is the key to a successful implementation.

Here is a minimal list of the fields that should be included in any archival indexing system:

File name

Archived date

File size

Archival status (online, near-line, tape, local, remote)

File type

Encryption (yes/no)

File attributes

Compression (yes/no)

File creation date

Access authority (multi-level)

File last modified date

Search and sort

File author

Grouping

File owner

Users are interested in one thing: how long it will take to find and retrieve the file(s) they need. Assuming the files are found in the online index of the archival system, the biggest impediments to accomplishing this are file attributes, archival status, encryption, compression, and authority. Horror stories abound of IT departments that, unprepared, are asked to find and deliver reams of paper printouts of requested documents. A properly implemented online archival-and-recovery system would have eliminated all of these nightmares.

File attributes include, for example, the ability to access, copy, modify, or execute recovered files. In the compliance world this ability may be unacceptable and could lead to bigger problems where forensic issues are concerned. The archival system must be able to accommodate any legal challenge to the chain of custody of any requested document. This means that the recovered archival data must be demonstrably unchanged from the original. This is not as easy as it sounds. In the world of magnetic media, virtually any file can be declared to be forensically altered if its position on the magnetic media ever changes by virtue of an operating system action. This is true even if the action is totally unrelated to the content of the file itself. For this reason consideration should be given to methods of storing archival data that prevent attended or unattended alteration of the location as well as the content of data. These systems exist today, but are not yet widely used in archival operations.

Encryption and compression of archival data are usually-and correctly-perceived as impediments to speedy recovery of data sets because they can slow the actual recovery or restoration of data significantly. They do, however, enhance the process and performance of network-centric recovery systems. This is because a substantial portion of business archival data is text-based and therefore benefits significantly from compression for storage and network-based recovery.

Other issues associated with encryption and compression need to be understood. Let us assume that the archival system contains files that hold personnel financial or legal data. If it is well-encrypted, then it is reasonably safe. If it is compressed, then it is unreadable. A Web-based archival interface could make the existence of these files apparent to unauthorized personnel. For this reason adequate controls need to be available to archive owners to control access to indexes as well as copies of the actual files.

The simplest system for protection of these files is through the use of multi-level passwords. A second level of protection is afforded through control of decryption and/or decompression keys only to authorized personnel, and to segment the archival records and encryption/decryption keys accordingly. Done on a case-by-case basis, these two systems can provide a continuous level of security for files that should normally have a limited audience. In the case of compressed files, in particular, recovery is network-friendly and faster overall, and sometimes substantially faster.

The Web-based indexing system should be able to assemble disparate collections of compressed and/or encrypted files and provide them seamlessly to requesting end users.

The largest impediment to the speedy recovery of digital archival data sets revolves around the archival storage hierarchy. Here, the concepts of online, nearline, and offline (tape) come into play. Traditionally this concept has been known as hierarchical storage management (HSM), or more recently information lifecycle management (ILM). The theorem states that all data is not valuable all the time.

These concepts apply equally well to in-house storage as well as external archives. In its simplest form, users make determinations about the relative time value of archived information using a simple formula that can be represented by the following chart:

Click here to enlarge image

Once a time value of the stored information is determined, its place in the hierarchy and migration over time is set. Policies are developed and the archival data sets are migrated to different archival storage types, as shown in the following chart:

Online and nearline archival support implies disk-based backups. Online disks are higher-end RAID array configurations. Nearline storage is bulk storage typically in a JBOD configuration. Offline implies tape media. In some cases this can be an automated tape library, which enhances the potential for user-independence with respect to file recovery. The good news is that as the cost of disk drives continues to drop, users have the ability to extend nearline storage further into the offline storage space. This in turn reduces the operational overhead (time/money/personnel) associated with large offline storage (tape-based) systems.

Click here to enlarge image

One important component needs to be added to the hierarchical mix: the burgeoning impact of compliance reporting and audits. Today, the combination of state, local, and federally mandated compliance reports in the US is well over 15,000 separate types of reports. What’s more, the time value of information can be severely skewed by the ad-hoc queries for offline data that accompany these reports. An effective design for network-based retrieval of digital archives will take this into consideration in the design of the archival hierarchy since the impact to IT operations could be overwhelming.

The best way to build an effective network-based digital data recovery system is to make it as transparent as possible to users, while at the same time being able to properly set expectations with respect to speed. A well-designed browser interface to the system will be able to tell the authorized user how large the files are and the estimated time to assemble them and get them across the network. Clearly, archival data that resides in remote online or nearline storage is easily accessible over the network. That said, the only remaining issues are file sizes, network capacity, and security. The variables associated with these three elements range considerably, making it difficult to quantify the amount of time it will take for each transaction. As separate components, though, we can build some level of expectation.

Types of calculations

Following is an example of the types of calculations that recovery planning will require:

As mentioned in the previous articles in this series (see November 2005, p. 32, and January 2006, p. 36), remote storage can traverse public Internet connections or private line connections. Private connections are more deterministic and faster in terms of throughput than public Internet connections. As such, how much digital data can be recovered across a private line network that connects a user’s company to a remote DR/archival storage system? Let’s use a DS3 (Digital Service 3) network connection as an example.

A DS3 circuit operates at a clock speed of 45Mbps, which translates to about 5.5MBps in storage terms. However, 45Mbps is a useless number for the purposes of determining network capacity. The speed of the circuit is determined by clocking and framing (ISO Layers 1 and 2) and is set by “the telephone company.” If you are using TCP/IP as your communications protocol between users and the remote digital archive, then you will add to the overhead and reduce the available capacity of the network.

Empty packets are often added to fill out telco frames and eliminate the asymmetrical nature of transmissions. All of this adds to overhead. Moving large files using a file movement utility such as FTP increases overhead. None of this can be avoided. The point is that what users will see in terms of throughput on the desktop will likely average one-half to three-quarters of the network’s stated capacity of 5.5MBps.

Let’s overlay an optimistic throughput number for a DS3 conncection of 4MBps with our digital data recovery requirements. In one minute a user can receive across the WAN a compressed and/or encrypted file that contains 240MB of requested data. In 60 minutes, across the same connection, the same user can request and receive slightly more than 14GB of digital data. This is a large amount of data and will more than satisfy 90% of all ad-hoc digital data-recovery requests for a single user. With an Internet connection of 45Mbps users could expect to see one-tenth of this amount of data in the same time frames.

There are many other speeds of data circuits and framing protocols, such as ATM for example, but to consider use of each of these requires a detailed analysis of each client’s overall network-based DR/archival and recovery plan. The math is fairly straightforward once all the client DR/archival and recovery needs are defined.

Summary

The productivity increases available to businesses that want to employ WANs for DR/archiving and recovery are large and quantifiable. This is true in terms of both times to respond to a storage or recovery request, as well as productivity in accomplishing these tasks.

An automated, network-based digital data archival-and-recovery system can significantly reduce the number of people involved in the logistics of archiving and retrieving records. The more automation that is available through the use of networks, the less likely records are going to be misplaced.

Current technology for security and security controls for digitally archived or recovered records is robust enough today to withstand most challenges. The impact of security and encryption on network throughput is negligible. The greater the automation, the lower the potential for misplaced or lost records and the greater the accountability for materials presented for archiving or materials requested for restoration, which can result in significant improvements in response to litigation, audit, or compliance reporting. Encryption and compression of digitally archived data reduces the potential for loss of sensitive information. A well-designed, networked-based DR/archival and recovery system can reduce response times to archival document movement by orders of magnitude.

George Hall is a member of Ridge Partners LLC and can be contacted at ghall@ridgellc.com.

Please enable Javascript in your browser, before you post the comment! Now Javascript is disabled.