Using Data Dictionary Creation as the Teaching Moment for Metadata

Metadata is a love note to the future…except when it’s not understood. While attending RDAP 2015last month, we noted a recurring question from the field of research data management (RDM) when working with partners outside of the library. What we noticed was the term shift around the definition of metadata as people moved between disciplines. Depending on discipline, researchers might use the terms annotation, lab notes, or codebook to describe data documentation. Within the library and information science discipline, we can use the word metadata to mean data description, used to provide access within searching interfaces; preservation and maintenance guidelines; or even notes on ownership and copyright.Like Lizzy Rolando and Jake Carlson in the tweet above, wewere struggling when common terms of our field were not understood. We were fortunate to accidentally find a solution in the task of writing and scoping a data dictionary. In this post, we’ll talk about the activity of creating a data dictionary and how it became the metadata teaching moment for our interdisciplinary RDM advisory group.

First, a little background to provide some context: At Montana State University, we have a Data Management Task Force (DM-TF—a clunky acronym, but one we’ve learned to live with!) that considers issues related to data storage, preservation, and discovery. DM-TF consists of librarians, IT staff, and a bioinformatician. The interdisciplinary nature of our group makes for vibrant conversations and a breadth of understanding of data issues across campus. But our varied backgrounds sometimes produce communication difficulties; we each have different workflows, different priorities, and different jargon.

Translation, please!

As we begin to include datasets in our institutional repository, DM-TF has been discussing description standards that will assist with discovery. As the metadata specialists on the task force, we suggested creating a data dictionary to outline the minimum viable description fields that we’ll require in our institutional repository. Our inner metadata geeks grinned and pushed up their glasses as we drafted the dictionary, which includes required metadata fields,along with explanations for each field, input guidelines, and crosswalks to Dublin Core, MARC, and schema.org. For this exercise, we were interested in a set of minimum viable fields of storage and discovery metadata for datasets. If you are interested, the data dictionary [as a work in progress] is available here: https://docs.google.com/document/d/1xt4cHpKPCdEkdCSgxGmSOjbfTALWHO1bSCusng3ISBw/edit?usp=sharing. We don’t pretend to be perfect, but it might provide a starting place for others. (You should also check out Kristin Briney’s post on data dictionaries for an excellent overview of the practice of scoping and documenting your data.)

So with our initial draft of the dictionary in place, we were ready to take it to our advisory group and were certain we would be lauded for our beautiful work. This is not quite what happened. When we presented the dictionary to DM-TF, we were met with confusion from the non-librarians in the group. They hadn’t quite known what to expect from a data dictionary, and were similarly perplexed by the term metadata. Several months into the existence of our task force, we realized that we’d been operating under the assumption that we all understood each other. In actuality, we faced what amounted to a kind of language barrier. As we talked with our interdisciplinary colleagues to define metadata and data dictionary, we realized that other common library terms like cataloging and data lifecycle were also causing communication breakdowns.

The teaching moment

As the group talked through our discipline-specific definitions, gathering around laptops to show examples, we felt an inspiring moment of synthesis. The draft became the means for us to define metadata for the group. Terms like crosswalk started to make sense as the bioinformatician looked at how he might map the fields we had outlined into PubMed metadata fields. IT staff intuitively understood the reasons behind our “version” and “identifier” metadata fields as they struggled with identification of disparate bits of data and needing unique addresses for those bits every day. We worked together to finalize the draft and things started to fall into place.

As the library field—and all of academia—becomes increasingly interdisciplinary, this exercise of building understanding will become even more important. We can’t assume that everyone around us communicates using the same terminology. As a result of that day, DM-TF has become more effective, because our thinking as a group is more aligned. And we’ve opened channels of communication that we hope will stay open as new topics arise, new terms are introduced, and our work becomes more complex and intertwined. Our suggestion is to bring metadata documentation into the activity of your working group. Draft a data dictionary on the edges with your metadata folks, but make sure your whole group has the chance to discuss and revise it. The shared understanding derived from this activity will move your group forward and make interdisciplinary RDM partnerships even stronger.

Sara Mannheimer is data management librarian at Montana State University (saramannheimer.com; @sara_mannh). Jason A. Clark is head of library informatics & computing at Montana State University (www.jasonclark.info; @jaclark). Both participated in the DLF E-Research Peer Network Mentoring group last year; Jason serves on the DLF E-Research Network faculty this year.