This is one of seven attachments to the Library's Request for Quotes in a limited
competition to
select a vendor to develop a prototype digital repository. The prototype will test potential
approaches for the preservation of recorded sound and moving image collections. The
attachments provide a sketch of the Library's concepts, circumstances, and proposed actions as
of July 1999, and prospective vendors were asked to discuss these ideas in their proposals.

The classic paradigm for library preservation features media that will endure for long time
spans.
Libraries have sought to persuade publishers to print on acid-free papers so that physical books
will last for centuries. Library preservation programs have copied materials onto microfilm of a
particular physical type in order to create a human readable copy that aging tests
suggest will last
for more than 100 years. In the face of this paradigm, it is no wonder that the discussion of
electronic reformatting has produced concern within the library preservation community.

Electronic formatting has been an inescapable part of audio and video preservation programs
for
several decades and, in this arena, there is no such thing as a human-readable copy.
Deteriorating
or endangered sound and moving image recordings have been copied from one tape to another in
order to keep the underlying content alive through time. In the face of the seemingly short life
spans of tape media, preservation professionals have examined various optical disk media with
interest, only to discover that the longer life span of the media may
not be matched by
the life
span of the hardware and systems required to recover the stored content. The
experience of reformatting in this general context has led specialists in the
field to seek alternate models for
reformatting and preservation copying.

The Library assessed the preservation of video materials throughout the nation and published
its
findings in A Study of the Current State of American Television and Video
Preservation (Library
of Congress: Washington, 1997; ISBN: 0-8444-0946-4). In another Library of Congress study,
the consultant William D. Storm outlined ways in which the migration of audio and video
materials into a computer-data environment could address the problems associated with
conventional reformatting of these original formats. Storm's report is titled Unified
Strategy
for
the Preservation of Audio and Video Materials (Preservation Research and Testing Series
No.
9806; Aug. 1997; rev. Nov. 1998), available from:

At this time, the Library of Congress is not seeking digital solutions for all of its preservation
work. Interest is high, however, in promising areas like printed matter reformatting and--relevant
here--magnetically recorded audio and video. Many tape media and formats traditionally used
for audio and video reformatting activities are no longer manufactured. The condition of tape
recordings from the 1960s and 1970s has reached a near-crisis state. At the same time, the
development of digital technologies for a variety of activities, including broadcasting, suggest
that
the current environment is conducive to the adoption of computer-based digital reformatting.
Thus the Library wishes to explore the computer-data preservation paradigm as it may apply to
the preservation of the digital content that emerges in these areas and as may also emerge from
projects whose goal is to broaden access to collections. Can this paradigm be refined to a point
that inspires confidence in the library community?

It is worth noting that the Library anticipates carrying out a three-pronged approach to audio
and
video preservation, at least until there is a high level of confidence in computer-style
preservation.
This triple approach is that (1) the original items will be retained, properly housed, and stored in
suitable environment, (2) conventional "tape-to-tape" copies will continue to be produced, often
in analog form, and (3) new computer-digital copies will be made in the manner outlined in this
document.

Recording and presenting special types of information needed by those who manage the
Library's preservation programs. (Section 4.4.4)

4.4.1 Persistent Archive Design

The term persistent archive is taken from a talk delivered at the Library of
Congress on June 3,
1999, by Reagan Moore from the San Diego Supercomputer Center and the National Partnership
for Advanced Computational Infrastructure (NPACI) at the University of California, San Diego.
The acronym DICE stands for Data Intensive Computing Environments.

Moore said that his team faced the challenge of finding ways to maintain digital data for
hundreds
of years in the face of system changes, articulating the challenge in these words: "the technology
to instantiate data changes every three years, the technology for data presentation changes every
four years, and the technology to archive the collection changes every five years." A set of slides (also in PDF format) that illustrated
Moore's talk has been made accessible on the WWW.

Most computer backup systems copy what is on disk to tape and sometimes other
media. Generally speaking, the format of what is on disk is in the form of
files. In the current CNRI
repository (see Attachment 2), the system relies on typical backup
systems to make copies of the files it contains (both digital objects and repository software).
These copies are on disk to tape media.

Thus the repository saves its "state" on appropriate media. If the repository were
catastrophically
lost one could restore from the media and have the state as of the last backup, just like most
software systems. This would include both the repository software and the digital objects that
are
stored in it. The restoration would take place in an environment that is the same environment as
before, using the same repository software. Backup data in this typical scheme is critically
important but is not data that can easily be moved or migrated to a different environment or
system.

4.4.3 Archiving Digital Content

To archive digital content is to produce a copy that is capable of being migrated
to a new system
or environment, as well as a copy that is capable of being refreshed, e.g., as one nears the end of
the life-span of the media upon which the archival copy is recorded. With large digital stores,
the
rate of data transfer may be an issue: if an obsolete system is failing, is there enough time to
"re-archive" all of its content to a new system? What is needed is an approach to the life-cycle
management of digital information.

There have been a variety of informal statements within the digital library community
concerning
the archiving of digital content. For example, staff at the University of California, Berkeley,
have
used archival repository as a contrasting term to access repository. The
former is designed to
preserve the objects it contains while the latter is structured to facilitate
access to or the
presentation of the objects it contains. Other specialists have referred to archival digital
objects
in contradistinction to digital objects, with the same intent as the distinction between
archival and
access repositories.

The use of the word preservation in the name Universal Preservation
Format suggests the UPF
group's interest in archiving. In their document titled Universal Preservation Format: Part 1: User
Requirements, Thom Shepard and Dave MacCarn offer a very rich description of an
archival
object:

[The UPF] framework incorporates metadata that identifies its contents
within a
registry of standard data types and serves as the source code for mapping or translating binary
composition into accessible or usable forms. The UPF is designed to be independent of the
computer applications used to create content, independent of the operating system from which
these applications originated, and independent of the physical media upon which the content is
stored. The UPF is characterized as "self-described" because it includes within the metadata all
the technical specifications required to build and rebuild appropriate media browsers to access
contained materials throughout time. Objects within the UPF are branded with a unique
identifier
that travels with the object through time. Any modification made to the content of the object
must be reflected in its identifier. (page 3)

The Prototyping Project provides the Library with an opportunity to construct an archival
object,
albeit probably not one as rich as the UPF object. The options include at least the following:

Archival object structured and archived as a set of raw files, capable of being re-deposited
(re-loaded) in a repository. Associated metadata in a communications format, e.g., XML
markup, and containing the metadata handed off when the object was first deposited
(loaded) and also containing such new metadata as may have been generated by the
repository system itself, e.g., "date deposited."

Archival object wrapped or encapsulated in a manner similar to that envisioned by the
UPF planners.

Although media in this context does not have the same vital significance as in
the
classic
preservation paradigm--it need not last for decades--it is still important. One might consider the
media used for digital archiving as a "holding" media that must have a reliable life greater than
that of the systems that read it. There can be no firm statement of this duration but one might
safely plan in term of, say, a decade, on the assumption that obsolescence will overtake any
computer system in less than a decade.

4.4.3.1Archival Objects as an Option for Repository Interoperability or
Exchange

A well designed archival digital object may also function as an exchange
object or as the
communications form of the digital object. This idea is noted here not because the
Prototyping
Project will undertake to exchange objects with other repositories but because the idea offers an
additional slant on the potential definition for an archival digital object.

Nationally and internationally, the library community has high interest in the interoperation
of
repositories, such as those under development within the Digital Library Federation (DLF).
The most interesting form of interoperation is interoperation for access. The ideal
expression of
this form of interoperation would empower a user to discover and access digital resources held
by
a variety of institutions. A related interoperation-related activity, however, would be is the
exchange of digital objects, in which one organization makes a copy of an object available to a
second organization to deposit (load) in their repository.

4.4.4 Preservation Program Metadata

The metadata associated with the digital object should also record the special
information
needed
by those who manage a library's preservation programs. Examples of
these types of data are listed in documentation of metadata captured in the Library's
Coolidge-Consumerism Experiment. This text references documentation pertaining to
preservation program information available from the Research Libraries Group, Cornell
University, and the University of California, Berkeley.

4.4.4.1 Preservation Programs and Metadata Traditionally Captured

Preservation programs in libraries and archives oversee the institution's policies and practices
regarding all forms of preservation. The mission statement of the Preservation Directorate at the
Library of Congress states that the office will "assure long-term, uninterrupted access to the
intellectual content of the Library's collections . . . . directly through the provision of
conservation, binding and repair, reformatting, materials testing, and staff and user education;
and
indirectly through coordinating and overseeing all Library-wide activities relating to the
preservation and physical protection of Library material."

The Library of Congress Preservation Directorate participates in the development of national
and
international standards and guidelines, e.g., for the practices used when microfilming. When
materials are treated at the Library, e.g., reformatted by microfilming, the staff ensures that there
is appropriate record keeping and the communication of information to other libraries and
archives about the actions taken. For example, a preservation microfilm must contain
information--typically in the form of text pages and targets--that describe the materials
represented and offer technical references, like resolution targets. And bibliographic or other
descriptive information is updated or created to communicate what has been done. The
digital-object metadata described in the preceding section is comparable to the information
traditionally
recorded in the course of "analog" preservation.