Conceptualizing Metadata

During the Bibliographic Control Working Group public meeting last week I had lunch with Karen Coyle, and we chatted about conceptualizing metadata – even if we didn’t pose the question to ourselves as such, I’d say we were both wondering how approaches to description and standards can create an environment which ensures cross-domain interoperability while also retaining local flexibility.

As timing should have it, an article which outlines my thinking about this issue has just been published in FirstMonday today. I co-wrote it with my friend and colleague Mary Elings from the Bancroft Library: we teach a class in the School of Information Studies at Syracuse University each summer, and found that our students needed a primer on description in libraries, archives and museums, as well as a method to organize the acronym madness and make sense of all the standards in use across the board. Since we could never find a good article to provide this particular educational moment, we took a stab at it ourselves (after a much appreciated kick in the pants from the VRA Bulletin, where the article appears in print form as part of a special issue on Shareable Metadata and CCO).

Karen and her friends have also been busy, and she e-mailed me with a link to an emerging article on her futurelib Wiki. What struck me about this piece was the difference in terminology we use to describe what I believe are essentially the same concepts. While Mary and I talk about data structure, data content and data format, Karen et al. use the terms Schema (= data structure), Guidance (= data content) and Encoding (= data format). Of course I like our terms better (since I’m used to them!), but the effect of making the familiar look disfamiliar actually opens up the possibility of looking at what’s proposed with fresh eyes – and maybe that was the intended effect. And, let me just say for the record, maybe I’m over-eager in my mapping of terms, and there are important differences between our categories.

As I read it, the draft argues that all of our metadata needs should be articulated at the level of an overarching reference model, which governs the creation of an extensible and flexible Schema, which gets populated with the help of community-specific Guidance, which get rolled up into an Encoding to transports it all. So far it’s a familiar story. Here’s the more interesting part: if I extrapolate correctly, the document proposes that whoever considers themselves part of the cultural heritage community (libraries, archives, museums, et al.) could be united under the big tent of the Framework and the Schema, yet they would retain the flexibility to adapt the description to their needs at the level of Guidance. While I’m intrigued by this vision, I believe as a model for rethinking description it would probably only scale to unite the different interests and areas of specializations within the library community, which is a feat in and of itself.

What I’m proposing as a solution to interoperability issues is much less ambitious: if we could recognize that we are a community of common interest across domains, then libraries, archives and museums could start treating the same type of materials with the same type of description. For example, if we could agree to describe the rare and unique items housed in our collections using the same suite of standards, we could build up a compelling aggregate of digital objects in a similar fashion that bibliographic materials now flow together with relative ease. To my mind, integration of different types of materials (books, artifacts, papers) rests on the foundation of figuring out how to get the same stuff to play nicely together. The key to finding interoperability at that level resides in the data content rules, and that’s where I see the need to come to more agreement. We know how to map data structures, and we know how to automate transformations. However, we don’t know how to “map” between AACR2 and CCO rules, for example, or bring relative homogeneity to collections described using different data content standards, controlled vocabularies and thesauri.

I’ve gone on for too long. Karen, let’s chat some more soon! And dear reader, if you’re still with me on this one, you probably should order this T-Shirt (courtesy of Inherent Vice).