Digital Archiving: The Impossible Dream? - Page 2

Rotation Strategy
One digital archiving option is to move old archives on to new media. Copy those old tapes, diskettes, and hard drives onto modern media while they are still usable. This is how I manage my personal files; every year I spend a day sorting and recording onto new media. As a business strategy, however, the difficulties are obvious: staff switches, management changes, companies merge, etc. It's time-consuming and expensive, and someone has to keep track and make it happen. Errors can also be introduced in the copying. Making checksums of everything can help catch errors, but that requires even more time and effort. Another potential issue is that the more the data are handled, the greater the risk of damaging something.

But short of inventing a miraculous new gadget that's guaranteed to work forever, a rotation strategy may be the most practical method. Ed Sawicki reports, "I now archive to optical media, and I've put two new CD-ROM drives in storage for the future. My archives tend to be cumulative -- I'm backing up the same old data along with the new data simultaneously, so the ages of the media are not as critical."

The difficulty in this scheme is upgrading old file formats into newer programs. For example, what's to be done with Quicken files from ten years ago? Or Word documents or Excel spreadsheets? Importing them into newer versions is yet more work, and the imports don't always result in identical copies of the files, as formatting is often lost and errors can be introduced. What about keeping copies of the original programs and the necessary hardware to run them? A good idea, perhaps, but finding parts for older machines can be difficult.

Another idea floating about is to record, on good ole paper, all the technical specs of the stored archives: file encoding, hardware specs, and whatever else is needed to re-create the means to access the data. While persuading the owners of closed, proprietary file formats to go along with such a scheme is probably more difficult than actually implementing it, there are a number of open formats that can be considered when planning for the long-term.

Non-Text Stuff
What about preserving movies, music, or software? For small programs it may be practical to print out the source code, but a one-million line program would fill about 20,000 printed pages. The Linux 2.4 kernel, for example, is about 300,000 lines of code, or about 6,000 pages. Large databases are not practical for hard copies, either. The bottom line is that we are well past the point of being able to reduce everything to paper.

Huzzah for Clerks and Librarians
It is said that being a sysadmin is the most varied job a person can take on -- technician, strategist, diplomat, and now librarian. In other words, it's a great job for versatile and creative thinkers like ourselves.