Protecting your library's data

Computers in Libraries
[February 2003]

Breeding, Marshall
.

Copyright (c) 2003 Information Today

Abstract: As long as Breeding has been managing technology in libraries, one of his main concerns has always been ensuring that libraries manage their data well. In his experience, the only effective way to manage library data involves using network servers.

It was a stormy night. Lightning strikes were reported all over. The next morning, when you come into work in the library you find that while most computer systems seem to be OK, a few computers just don't start up, including your director's. A closer inspection reveals a total failure of the hard drive-it's practically melted. You know that there's no chance of recovering anything from it. Your boss, the director, mentions as you examine the charred remnants of her computer that she'd just finished the library's budget for the next year-the culmination of many weeks of work. "It's still there? Right?" she asks.

Though this worst-case scenario is fictional, if it happened in your place today, would you be sweating buckets, or would you simply be able to say "No problem. It's safe and sound. Just give me a few minutes to replace your hard drive."

As long as I've been managing technology in libraries, one of my main concerns has always been ensuring that libraries manage their data well. Libraries just don't have the luxury of being able to recreate data lost to any of the many vagaries, events, and incidents that plague computer equipment, such as hardware failures, human error, and malicious viruses or worms. It's a cruel world sometimes. While today's generation of computer hardware has proven to be more reliable than ever, the other vulnerabilities seem worse than ever. I especially worry about each successive generation of viruses and worms as they become more difficult to detect and stop, and as they demonstrate increased ability to destroy data.

In my experience, the only effective way to manage library data involves using network servers. Local hard drives-- while exceedingly capacious these days-- lack the levels of security, redundancy, and protection offered by a healthy and wellmanaged network server.

My basic argument against using local hard drives for data storage involves the fact that computer users don't back up their hard drives. They say they do, but they don't. Even though you may have a Zip disk, CD-R drive, or other suitable equipment, workers just don't have the time and discipline to manually back up their files every day. It's also not good use of their time. If you expect each library employee to spend 20 minutes each day performing backup and other housekeeping tasks on their computers, the cumulative cost to the organization can be enormous. This is an activity that is best performed behind the scenes using an automated procedure that doesn't forget and that doesn't make excuses.

One possible strategy would be to allow staff members to save data to their local hard drives, and design a process that backs up every hard drive each night. While that's technically possible to accomplish, this approach generally doesn't work well. The computers have to be left on, and the complexity of such backup processes among dozens or hundreds of computers far outweighs the logical alternative-that of using a network file server for all critical data.

I believe that every library-or for that matter any organization that relies on computers-should have a network-based data storage strategy. You can't guarantee the safety of the organization's data without the right hardware, software, security, and procedures. A viable data storage environment in a library would include this equipment:

Network File Server-A network file server configured for secure and convenient access for the whole staff should lie at the heart of the organization's storage strategy. A well-implemented file server works almost invisibly to its users. Access to the server does not require a separate login and blends its resources with the computer's natural interface. The volumes and directories of the remote file server appear, for example, just like regular drives and folders. The key to success is making it just as easy for the computer user to save their stuff into the proper folder on a remote server as it is to keep it on his or her local drive.

Private and Shared Folders-An important aspect of setting up a network file server involves providing the right resources for the individuals and groups throughout the organization. Each person must have a folder for his or her exclusive use. These personal folders should be considered every bit as private as the hard drive of their local computers, where no other staff person can access its contents. But it's better than a local hard drive, since they can access it from any computer in the organization. Whether someone logs in at the computer in his or her cubicle or at the reference desk, they can get to their files.

File servers should also offer shared folders that correspond to the teams, departments, committees, and other structures of the organization. Having the ability to share data facilitates collaborative work and helps avoid duplication of effort as groups jointly author reports, presentations, and other documents.

Network Operating Systems-File servers come in various flavors: Windows 2000, Novell Netware, and various versions of UNIX. Which one you use depends largely on your organization's overall network architecture, but any of these major network environments are perfectly capable of providing solid file server capabilities.

Fault-Tolerant Hardware-Regardless of the operating system that powers the file server, it should reside on industrial-strength hardware. This doesn't necessarily involve great expense, but it does mean using a server-class computer with redundant components protected by an Uninterruptible Power Supply (UPS). The UPS keeps the server going even if the power goes out and protects it from the types of power spikes that cooked the hard drive in our fictional disaster. The storage system of a network file server usually relies on RAID (Redundant Array of Independent Disks) architecture, which stripes data among multiple drives in such a way that one or more of them can fail completely without any noticeable impact to users-except maybe a slowdown in performance. I've seen RAID storage systems in action many times. It's amazing to set up a file system on a server, pull out a disk drive while the server is running and see it just keep running with all the files intact. Just pop a new drive in, and it redistributes the data back onto the server automatically, and it returns to optimal performance levels. While RAID is a little more expensive than the JBOD (just a bunch of disks) alternative, it offers greater peace of mind to network administrators and pays for itself through increased reliability.

Backup and Storage Management Systems-Even the most reliable file servers need multiple levels of data protection. Just in case something goes wrong, you need to be able to restore any individual file or even rebuild a server from scratch from your backup tapes. In my experiences as a network administrator, we were much more likely to turn to our backup tapes to restore files accidentally deleted by a user than lost through a hardware failure. The point is to be prepared for any contingency.

Any network file server should be equipped with the hardware and software necessary to automatically make backup copies of all its data regularly. The standard approach involves performing a full backup of all data files once a week and backing up all files changed or added since the last full backup at least once a day. Rotate your backup media to preserve the full data for several weeks. Most organizations keep monthly or annual backups for the occasional need to retrieve files deleted long ago. The more a backup system works automatically with minimal need for human intervention, the more likely it will successfully protect the organization's data. A typical backup system relies on a network operator to change the tape in a drive once a day, while the software automatically performs the backup tasks at night when few users are likely to be using the system. Robotic tape systems, though quite expensive, can take humans out of the picture altogether.

Procedures, Training, and Practices-Having all the best equipment and software in place will be in vain unless the individuals in the organization follow the right procedures. Many times have I seen organizations implement a great network storage environment yet not train their staff to make correct use of it. People continue to save files only on their local drives. Either they don't trust the network system or they don't know that it's available to them. The decision about where to store data isn't a personal choice, it's an institutional one. A staff member's personal preference for the letter C: shouldn't override the institution's need to have all its data well-organized and secure.

If It's Easy, It Will Work

Librarians will store their files on the network if they understand its benefits to the organization and if it's easy. One detail that often gets in the way involves the default configuration of software applications. If the network administrator, for example, pre-configures the word processor for all staff so that by default it saves files to the home directory of the user on the network, then it becomes easy and automatic to do the right thing. If the user has to wander through a series of drives and directories to find the file server, it's far more likely that she or he will just plant it on the local drive.

A data storage strategy that places all user data on network servers essentially makes all desktop computers disposable. If something goes wrong, you can take the old one away, put in a new one with all the standard applications pre-installed and let the staff member get back to work with minimal interruption. Through the use of disk replication applications such as Norton Ghost from Symantec Corp., technical staff can rapidly configure computers for deployment.

I hope that I'm preaching to the choir. But as I do consulting in libraries, more often than not I observe a haphazard approach to data storage. It makes me worry. Just put yourself in the scenario like the one I described in the beginning and ask yourself if you're prepared for even the most critical computers in your library to quit without notice. No sweat! Right?