How to Prepare Your College for an Uncertain Digital Future

Paolo Mangiafico's job at Duke U. involves trying to imagine the ways people will use data in the future.

By Marc Parry

Paolo U. Mangiafico does a job that is not easy to describe. Duke University calls him director of digital information strategy. But the work isn't just information technology, or scholarly communication, or library services. It's all of them, a big-picture portfolio that hinges on worrying about this question: How can a university organize and preserve the deluge of digital data before it washes away—and preserve it for uses that have not been imagined yet?

The data could be anything from student-produced course work to raw research results to informal material like blogs and wikis. Mr. Mangiafico, who reports to the provost, started his job late last year as part of a new project partly paid for by the Andrew W. Mellon Foundation. Duke's boss of bits spoke with The Chronicle about the reasons the university created his position, and why preserving data is crucial to the future of scholarship.

Q. Can you explain in 60 seconds what a digital strategist is?

A. I'll start with an anecdote. About six or seven years ago I heard a talk by the architect who designed the new Nasher Museum of Art here at Duke University. Someone asked him what the role of an architect was, and he said, "To interpret and to inspire." I really liked that. A lot of what I do in this role is trying to get a better understanding of what the changing needs are, and methods of scholarship in a digital age, and how we produce information, and how we manage it, how we share it, how we preserve it. And to inspire the technology planners to adopt approaches that are holistic and have a long-term view.

Q. Why did Duke create this position?

A. The directors of the Duke library and the library at Dartmouth College joined together and got a grant from the Mellon foundation to look at the issue of, How do we deal with all this digital data that's being created but isn't being collected and preserved and published and shared in the way that libraries and universities are used to in the print world? At the end, the report recommended that this was not so much a library issue, or a technology issue, it was really a university issue. And that there should be a position at the university level.

Q. Why does preserving all this stuff matter? Who cares?

A. Scholarship is built on the work that's been done before us. Isaac Newton said if we see further, it's because we're standing on the shoulders of giants. We have the shoulders of giants to stand on now because libraries and publishers and so on have had these methods of capturing and preserving and making accessible all these materials over time. Will future scholars be able to stand on the shoulders of giants, of the work that's being done now?

Some of the data that was collected from the Apollo missions in the 70s was lost for many years, or climate data or things like that, data that was collected some time ago that would be really useful to have now.

Q. But how can you plan for uses and formats that don't yet exist?

A. Part of it is looking at the work that's already going on and what people are doing. There's this project that's just getting started called the Digging Into Data Challenge, where people are looking at different data sets and trying to do interesting data mining on data sets that were created not necessarily for that purpose. So I think we'll be able to learn from some of these experimental projects that are going on now.

Also just planning for doing things in an open way and documenting that process and using open standards such that one can have flexibility in how one uses the data later. In the mid-90s and late 90s, some of the projects that I was involved in here at Duke Libraries were digitizing materials from the special-collections library: sheet music from the 1920s, or historical advertisements, papyri. We made sure that we encoded our metadata and the images that were being scanned in open standards and in standards that we knew would be usable in multiple different tool kits in the future. And so, for example, some of those collections that we created in the mid-90s are now available through an iPhone app.

Q. How do you maintain the data in a usable way and make sure the world knows it exists?

A. I think that's a big future role for libraries. Libraries typically have been about providing access to things that have been published somewhere else. It's kind of an access point and a collection point. And I think one of the big new roles for libraries in the future is going to be helping our local communities to publish and make accessible materials that they're creating locally here in ways that can be consumed by people out there in the world."

Q. Isn't it true that researchers often don't want to share their data?

A. I think that is true. That's part of the cultural piece. Certainly there are cases where it's not appropriate to share it for privacy reasons or while something's in progress. At the same time, I think that there's a lot of data that's not being shared now simply because it's not easy to do so, or it's a burden. And I think if we can put in place services that will make it not be a burden, and show the benefit of doing this, and have ways of tracking how it's being used. … I think we'd find that more people would be willing to share.

Comments

1.jflahiff - December 15, 2009 at 04:53 am

I remember memorizing the (5??) steps of the scientific method in 6th grade back in '66. I was deeply impressed with the last one..something along the lines of sharing results. It meshed well with the values taught at the parochial school I attended. This piece meshes well with what I sense...libraries seem to be evolving into laboratories of synergistic learning through "trends" as information commons. My hope is that the creators of data sets get their just due (to be honest, I don't have a good handle on what just due is, just sense it is right) as they share their findings because they understand they are part of a community transcending borders on many levels. Yes, I am a librarian, working part time at the health science campus of an urban university..(I guess I am stuck on mesh...because my work these days seems to be constantly getting the word out on comprehensive literature searching through Medical Subject Headings (MeSH in PubMed...the US National Library of Medicine's very comprehensive "index" to biomedical literature)).

2.johntoradze - December 16, 2009 at 01:43 pm

I think that quote, or something like it, originated with Bernard of Chartre 500 years before Isaac Newton. (Not terribly important, but ...)

I think that making datasets available as soon as possible is crucial for modern science, particularly bioscience, to survive with respect. I have seen too many instances of manufacturing of data, images, or suppression of negative results, by very well respected investigators. The best way to deal with that is by making science an environment in which all the datasets are available. Then readers have a chance to see when, for instance, there is no blood drawn for a flow sample that appears.

3.angtughpak - December 17, 2009 at 02:06 pm

johntoradze--You are correct about the quote. One place the reference to Bernard of Chartes and that quote appears is in The Medieval World View (Cook and Herzman, 1983); it is a great visual of the process by which we add to the world's knowledge.

More to the point, Newton's use of the quote only serves to illustrate the value of the idea.