The most interesting use of our data will not be what we think it is

In Bloom

an uncut newsbook showing a quarto imposition

It’s safe to say, the bloom is off the rose. Online collections just aren’t as sexy as they once were. Increasingly ubiquitious plans to put digital images online excite an increasingly smaller crowd. And projects that rely on new “Turning the Page” applications are likely to draw more ire than praise from a growing cohort (while beautiful, they pose problems for scholarly work and digital preservation). Instead, it is fashionable, especially within some corners of the Digital Humanities community, to point out the shortcomings of digitization, the bits that get lost in translation, the bytes that are left behind. Reproductions as surrogates are necessarily incomplete, and people are more than eager to point out their imperfections.

At the DigiPal conference earlier this year, Elaine Treharne offered this bit of advice for project managers: Stop! Stop all digitization efforts for a year or more, and carefully reassess what you are digitizing, for what use, for which audience. Time and money are too limited and too valuable to be wasted on something that will have to be redone when technology improves, or when the focus of scholarship shifts. Her concerns are valid, and not unfamiliar. My colleagues Nadia Seiler and Jim Kuhn encountered similar concerns at this fall’s Digital Library Federation forum, where coders pressed librarians to specify “use cases” for open access to bibliographic data and digital images. Honestly, this is something we often struggle with as we develop specifications for Folger projects.

A rebuttal to Elaine Treharne’s position came later that week from Rob Sanderson (Los Alamos National Laboratory), who said, “The most interesting use of your data will come from someone you don’t know.” I have come to agree with that sentiment, though I might modify it slightly, to “The most interesting use of your data will not be what you think it is.” We cannot anticipate the tools that will be available 5, 10, 20 years from now. We cannot imagine all the ways our data will be merged with other projects. But if we don’t put our resources out there, we won’t be a part of that future.

The Imposition, what a show…

a screenshot of Impositor's layout of the inner forme of the B gathering for Titus Andronicus

Impos[i]tor started out as a modest project, but I find it interesting in a number of ways. Quite unexpectedly, it relieved many of my own anxieties about the utility of digitized collections. When we built the (ever-growing) Luna database, we certainly didn’t imagine a tool like this. But Impos[i]tor could not exist without Luna. And, although it exposes some of the weaknesses of our digital collection, as an educational and scholarly application, it still works. We did not have to anticipate Impos[i]tor to create an useful and usable Luna database. On the other hand, the ways in which Impos[i]tor is limited (by the images in Luna and the inconsistent metadata that accompanies them) have allowed us to reconsider some of our digitization strategies as we move forward.

To wit, the more intriguing features of Impos[i]tor are as follows:

It is a second-generation tool. That is to say, it’s an application that is possible only because an accessible digital collection already exists.

It is a creative solution to a complex problem. The mandate: create a way to virtually take a book apart and put it back together. While a Flash animation (see, for example, the “Printer for a Day” activity at the Manifold Greatness site) might have been able to accomplish this, it would be more complicated to program, and, probably, less satisfying to use. Impos[i]tor was easy to design, easy to use.

It demonstrates some unintended ways collections can be used and, thus, makes concrete some of the criticisms by DH scholars. For example, it reveals the importance of completeness: if you don’t digitize blank pages, you can’t recreate the book.

It highlights the importance of good metadata. Is this a quarto or an octavo? Is this the first page of a signature? The metadata in Luna is sufficient for Impos[i]tor in its current form, but it is not complete enough to generate an entire book at the push of a button. This highlights an area the Folger is beginning to work actively in: how to provide appropriate descriptive metadata for digital surrogates when (in many cases) we lack full bibliographic descriptions for the original.

It shows that digitization doesn’t have to be a one-way process of pushing everything into the computer. While many projects display and/or manipulate virtual objects, Impos[i]tor is designed to reproduce physical objects. Of course, the critiques of the hidden incompleteness of digital images apply to physical surrogates as well, a fact which is acknowledged in the name of Impos[i]tor.

No matter how we choose to design our digitization projects, we will get it wrong. There will be uses we did not anticipate. There will be techonological improvements that threaten the longevity of our work. This is always true and we will never catch up to the future’s horizon. Still, smartly designed projects will get enough right that they will be useful now and into the future. Projects that rely on standards, that offer bibliographic citations, and that fully acknowledge the existence of missing or incomplete items, provide a real value now. Perhaps more importantly, they also provide the source material for future projects, projects that cannot exist without our work today, projects that may, ultimately, prolong the life and utility of our work today.

Like this:

MICHAEL POSTON is Database Applications Associate at the Folger. He has an MLS and an MA in English. Digital projects include the Union First Line Index of English Verse, the PLRE, and the Finding Aids Database. His play, The King's Tragedy, had its debut at the American Shakespeare Center's 2011 Blackfriars Conference.