Metadata for Archival Collections: Challenges and Opportunities

Metadata is one of the significant costs of digitization. Although archival items can be digitized without cataloging, a digital collection cannot be created and delivered without metadata.

Providing sufficient metadata promptly for the abundance of digital resources can create a bottleneck in a workflow. Creating and maintaining metadata about objects—and in particular digital information objects—is time consuming and costly. Metadata creators must provide enough information to be useful but cannot afford to be exhaustive.

The Four Metadata Types

There are four types of metadata: administrative, descriptive, preservation, and technical.

Administrative metadata captures the context necessary to understand information resources. It documents the life cycle of an electronic resource, including data about ordering, acquisition, maintenance, licensing, rights, ownership, and provenance. It is essential that the provenance of a digital image object is recorded from, where possible, the time of its creation through all successive changes in custody of ownership. Users and curators must be provided with a sound basis for confidence that a digital image is what it is purported to be. There should be an audit trail of all changes.

Descriptive metadata attempts to capture the intellectual attributes of the images, enabling users to locate and select suitable assets based on their subjects.

Preservation metadata is the information about an item used to protect it from deterioration or destruction.

Technical metadata assures that the information content of a digital file can be resurrected even if the viewing applications associated with the file have vanished.

Embedded or Linked

Metadata can be embedded in digital images or stored separately. Embedding metadata within the image it describes ensures the metadata will not be lost, obviates problems of linking between data and metadata, and helps ensure that the metadata and image will be updated together. Storing metadata separately can simplify the management of the metadata itself and facilitate search and retrieval. Metadata is usually stored in a database system and linked to the items described.

Useful, Not Exhaustive

The biggest challenge is balancing the ideal scenario of comprehensive description with the more practical scenario of “good enough” description. Factors influencing this equation are the limited resources available for digitization regarding staff, time, and funding.

In my experience, cataloging and indexing can account for nearly a third of the overall costs of projects. These costs present considerable challenges to the economics of traditional library cataloging, which creates metadata records characterized by precision, detail, and professional intervention. This high price is impractical in the context of the growth of networked resources—and less expensive alternatives are needed.

Metadata creation requires both organizational and subject expertise to describe images effectively. Organizational expertise refers to the ability to apply the correct structure, syntax, and use of metadata elements, while subject expertise refers to the ability to generate a meaningful description of the material for users. High-quality metadata utilizing both expertise types is an integral part of effective searching, retrieval, use, and preservation of digital resources.

Promoting Interoperability

Describing images with metadata allows them to be understood by both humans and machines in ways that promote interoperability. Interoperability is the ability of many systems with different hardware and software platforms, levels of granularity, controlled vocabularies, data types, and interfaces to exchange data with minimal loss of content and functionality. Archival assets across the network can be searched more seamlessly using defined metadata schemes and shared transfer protocols.

Metadata Crosswalks

Metadata crosswalks—mappings of the elements, semantics, and syntax from one metadata scheme to another—further facilitate the exchange of metadata. The degree to which the crosswalk is successful at the item level depends on the similarity of the schemes, the granularity of the elements in the target scheme compared to that of the source, and the compatibility of the content rules used to fill the elements of each scheme.

Crosswalks are essential for collections where resources are drawn from a number of sources and are expected to act as a whole, perhaps with a single search engine applied. While crosswalks are critical, they are also labor-intensive to develop and maintain. The mapping of schemes with fewer elements, or less granularity, to those with more details, or more granularity, is problematic. These problems have led to frustration for users who want consistent metadata interoperability across digital imaging products and services. Manufacturers of digital imaging hardware, software, and services spend substantial resources dealing with these problems. Until these complexities are resolved, the problems will continue to cost users and archivists time and resources.

Translating Metadata Across Collections

Due to these issues, archivists and other information professionals are working towards creating better models for crosswalks that unites the crosswalk, the source metadata standard, and the target metadata standard. With each effort, we improve machine-readable encoding and human-readable description, while bringing together all of the information required to access and interpret information. In time, we will become better at describing complex objects for our users.