Getting Out of the Book and Into the Digital

digital library

One of the things we’ve covered quite a bit in my Digital Scholarly Editing class is metadata and the various standards and encoding practices used within Digital Humanities. These standards are especially useful when creating a digital project (whether that project be a simple digital edition, a scholarly archive, or a large digital library). The most commonly used standard seems to be the Text Encoding Initiative (TEI), but it is certainly not the only one that exists. Other encoding standards serve purposes beyond that of TEI and are geared more towards the creation of larger digital projects such as libraries. Attempting to cover them all here would require a series of posts, so, instead, I thought I’d focus on one that is often leveraged when creating a large digital archive or digital library: the Metadata Encoding and Transmission Standard, otherwise known as METS.

A Brief Introduction to Metadata & Standards

Before diving too deep into METS, I should first take a step back to ensure you know what exactly metadata is. In short, metadata is “data about data” (“What is Metadata”). In the case of Digital Humanities, metadata exists to give further context to a digital object. It is somewhat subjective in that the person authoring the metadata (or compiling the object) makes a decision about what data he or she feels is important. Metadata can be anything – from descriptive to bibliographical to contextual. But how does one collect and present this data in a way that other people or software programmes can interpret? After all, if it’s subjective, then how will it make sense to someone not intimately familiar with the object itself?

Enter metadata standards. I’ll save you the long history lesson on how they all came about, but a quick overview is necessary to provide some context. Metadata standards started back in the Wild West Days of computers before the introduction of the internet when something called “SGML” (Standard Generalized Markup Language) was introduced in the mid 1980s. SGML actually has a history actually dating back to the the late 60’s when its precursor, GML, was first established by Charles Goldfarb, Edward Mosher, and Raymond Lorie (“SGML Users’ Group History”). Eventually, SGML evolved out of GML. Many technologies that we are more familiar with today, such as XML and HTML, are actually based around the standards put forth by SGML. I could go on for pages about what exactly SGML is, but others more qualified have already done so. For a great list of resources about SGML, check out Robin Cover’s SGML/XML Website.

What is METS?

So how does METS play into all of this? With the creation of the internet in the early 1990s and its quick proliferation, many libraries began looking at ways to make their collections digital. And with the creation of a digital library, these organisations needed a way to standardise how they created their digital objects, not only to create a standard for software programmes but, more importantly, to allow them to easily share digital objects between themselves (Amaral). The standard itself allows for not just descriptive metadata, but also other metadata such as data related to rights management, technical metadata, and data related to preservation (McCallum).

METS is broken down into seven different sections. The Library of Congress (which currently houses METS and its documentation) has published an article entitled “METS: An Overview and Tutorial” that defines these sections as follows:

METS Header (<metsHdr>) – standard header data used to describe the METS document. This includes information about the various roles played in the creation of the document (who was the editor, the creator, the archivist, etc.), along with various other miscellany such as creation date, last modification date, and record status.

Descriptive Metadata (<dmdSec>) – contains metadata used to describe each object within the document. This can be internal metadata that describes various standard attributes such as title, author, and publication information. In addition, it can act as a pointer to an external metadata record written in a supported format (as of this writing, these supported formats are: MARC, MODS, EAD, VRA Core, Dublin Core, NISO Technical Metadata for Digital Still Images, Library of Congress Audiovisual Metadata, TEI, DDI, and FGDC). Each METS document can contain one or more <dmdSec> element.

Administrative Metadata (<amdSec>) – contains metadata related to the technical creation of the object, any intellectual rights management data, data related to the original analog source of the object, and “digital provenance data” or information related to the relationship between files, transformations, and migrations of a digitised object. Like descriptive metadata, each METS document can contain one or more types of administrative metadata.

File Section (<fileSec>) – describes the files related to the digital object. The file section is further defined by <fileGrp> elements which are used to define all of the files that comprise a single version of a particular digital object.

Structural Map (<strucMap>) – defines the hierarchy of all of the various pieces of the digital object and is used primarily to help define how the end user would navigate the object. The Library of Congress calls this “the heart of a METS document” (“METS: An Overview and Tutorial”).

Structural Links (<smLink>) – this repeatable element is solely used for indicating hyperlinks between nodes on the structural map.

Behavior – this section defines individual behaviours (<METS:behavior>) that allow for executable content within the digital object. Each behaviour has an interface (<METS:interface>) which defines the behaviour as well as a mechanism (<METS:mechanism>) which acts as a pointer to a piece of executable code that runs the behaviour.

Why Use METS?

Now that I’ve described what exactly METS is, you might be wondering “Why use it?” It’s certainly much more complicated than something like TEI (which consists of a header and a body). In the case of METS, the biggest thing it brings to the table is its ability to link objects (Cover). METS also provides for technical metadata (such as the digital object’s format, utilisation characteristics, etc.) as well as allowing for the definition of how the digital object can interact with other services (“Metadata Encoding and Transmission Standard: Primer and Reference Manual” 15). While TEI is great for encoding a manuscript (and in my opinion is especially wonderful at providing contextual information regarding the text via the use of annotations and notes), METS is more geared towards structuring a digital library or archive — so the standard is more about providing the data needed to create the repository than it is about providing the more contextual aspects of the object that TEI does so wonderfully.

Conclusion

METS is one standard among many. There is no one standard that trumps all situations — after all, each was created to solve a particular problem. But it’s important to understand these various metadata standards in order to make the proper decision regarding which standard(s) to leverage when working on your own digital projects. Take the time to research what is out there and be sure to understand the strengths and weaknesses behind each. The more diligent you are in your initial research, the smoother the implementation of the project will be.