Tuesday, February 7, 2012

The Curation of Collaboration: Experiments in Mobilizing Museum Archives

This guest post is by Gaurav Vaidya, Andrea Thomer, Rob Guralnick and David Bloom. Gaurav, a graduate student at CU Boulder, has been editing Wikipedia since 2002. Rob is a biodiversity informatician, museum curator and collaborative coffee consumer who sometimes inhabits Boulder, Colorado. Andrea is a graduate student in library and information science, and a former excavator of Pleistocene megafauna. Dave coordinates VertNet from his secret lair in the Museum of Vertebrate Zoology at the University of California, Berkeley.

We live in a world that is increasingly digital. While museums are gradually adapting to this new reality, it is crucial that we complete ongoing digitization projects with minimal resources and a maximum of community engagement. Traditional ways of doing this are not going to be enough; museums need to be bold in their efforts to harness the power of readily available, but previously untested, resources, tools and techniques.

One technique we believe will become increasingly important in keeping costs down and public engagement high is “crowdsourcing”—using interested members of the public to contribute directly to cataloging, transcription and annotation activities on museum collections. A perfect example of crowdsourcing is Wikipedia, built from scratch over the last decade by millions of volunteers into the largest and most popular general reference work on the Internet. As an experiment, we decided to try to use Wikipedia’s own resources on a museum project to unlock valuable data about Colorado’s biodiversity in the first half of the 20th century.

Junius Henderson was appointed the first curator of the newly created University of Colorado Museum of Natural History (CU Museum) in 1902. He kept field notebooks containing handwritten daily accounts of his expeditions across the Rocky Mountains over a 26 year period. Henderson’s notebooks paint a vivid picture of a changing Colorado, as horses-and-buggies give way to cars, cities grow, and wild landscapes retreat. Although their primary value is to biologists and geologists, his notes will also be of value to historians, geographers, and anthropologists interested in this period of Colorado’s history.

Fast forward 50 years, when Professor Peter Robinson, himself a CU Museum Director and now Emeritus Curator, transcribed all 14 notebooks into Word files. The notebooks themselves were eventually scanned by the National Snow and Ice Data Center (NSIDC). As an experiment, we decided to publish them to Wikisource, an extension of Wikipedia founded in 2003 with the goal of crowdsourcing the transcription of public domain texts for permanent record. Although primarily focused on literature (from The Wind in the Willowsto A Study in Scarlet), Wikisource already has a large number of historical texts, from George Washington’s First State of the Union Address to President Obama’s State of the Union Address last month.

We began with Henderson’s first notebook, covering the period from 1905 to 1907. We uploaded Henderson’s notebook scans to Wikisource, then used its built-in software to create an Index page for this notebook, which provides page-by-page access to the notebook (Wikisource’s software also allows each notebook to be displayed in a single page). In less than three weeks, we had copied all of Robinson’s transcript onto Wikisource, making making both the scans and text of Henderson’s first notebook viewable side-by-side and publicly accessible. Success!

Having scanned and transcribed notebooks was fantastic, but we wanted something more. In recording his observations of the species around him, Henderson had recorded a baseline against which we could compare the species distributions we see today: are birds once spotted by Henderson near the town of Florissant, Colo., still found there today? Or have encroaching human settlements and climate change forced them into higher, colder and more distant locales? Each of his field notebooks contain hundreds of species observations from the early 20th century, long before organized data collection became the norm for ecologists. We began annotating Notebook 1 by journal dates, locations and species names in mid-December, and—with the help of some anonymous contributors—had completely annotated all 112 pages a mere month later. You can see these annotated notes on Wikisource.

We’re pleased with what we’ve achieved in a very short period of time: transcribed, annotated notes available side-by-side online and reaching out to a community of existing users interested in trying to read scrawly handwriting scribbled during field trips to inhospitable climes. Now, we’d like to reach out to you: we’ve uploaded Notebook 2 and Notebook 3, and we’d love your help in transcribing and annotating them. We’d also love to see you upload your museum’s field notes to Wikisource, and to try out its infrastructure to build your own transcription communities and to annotate your own collections.

Most importantly, we’d love you to be bold, to experiment with new technologies, to trust your data to untrained strangers and to get involved in opening museum research to new communities of online visitors and citizen scientists. We’re looking forward to your feedback, suggestions and reports as comments here, on Twitter, or through blog posts.

Use the comments section, below, to lob questions to the authors about the project: logistics, challenges, outcomes, resources needed, etc. Or to tell us about crowdsourced collections projects of your own.

For updates on the Henderson Field Notes and broader issues related to museums and digitization, check out Rob and Andrea’s blog, So You Think You can Digitize, where “screwball comedy meets serious thoughts on digitization.”

We'd love to have you work on this project with us! You can do as little or as much as you'd like.

To get started, click over to the project main page, http://en.wikisource.org/wiki/Field_Notes_of_Junius_Henderson. There, you can look through the two journals that have been "completed" to see if you find anything that might be transcribed or tagged incorrectly, or missed altogether. If you find errors or omissions, please, by all means, correct them.

If you want to get involved in the third journal, we've just begun the process of adding the transcription for each page (actually an intrepid volunteer like yourself is doing it this time). Most of these pages still require tagging of things like the common and scientific names of species observed, locations visited, dates, and so on. In any case, all of the pages from all three journals are in need of proofreading.

You can register wikisource so we'll know who did the work, or you can work anonymously. That's up to you.

For more specific information about what we we need and what to tag, you can read more on the So You Think You Can Digitize Blog, beginning with this post http://soyouthinkyoucandigitize.wordpress.com/2011/11/28/an-ode-to-founders-and-a-field-notes-challenge-part-1/. Several additional posts follow this one with more information about how the project works.

After all of that, if you still have questions or are uncertain about how to proceed - or if you've mastered wikisource and would like to start on the fourth journal give us a shout at any of our respective Twitter accounts linked in our brief bios or contact me directly, dbloom ~at~ vertnet.org, and I'll help to get you started.