Data Sharing: Panacea or Can of Worms?

Author’s note: My interests within the LIS field are data curation and e-science librarianship. This is a hot topic that is growing every day, and skilled e-science librarians are needed to fill the gap. If you’re interested in learning more about data curation librarianship as a future career, leave a comment here, and I’ll follow up with more information.

Back in the Fall, Micah wrote a post about Open Access Week. In it he discussed open journals, open data, and the ALA Code of Ethics. Open data is what today’s post is about. An important ongoing question in the world of data curation today is how to get scientists to share their data by placing it in a data repository. There are many scientists who are unaware of the fact that their data has value to anyone but them and their research team. On the other hand, there are scientists who are very possessive of their data and don’t want to release it for fear that they will lose control of it and not be credited for its creation. There are also those who want to suck every drop of publishing potential out of a data set before releasing it to anyone else.

Last November, there were two requests for information (here and here) put out by the White House Office of Science and Technology Policy. One asked if peer-reviewed journal articles resulting from federally funded research should be accessible to the public. The other asked if data from federally funded research should be accessible to the public. OSTP has released the comments from that RFI here. I have not read all the responses, but the ones I have read seem to indicate that the support of open-access is high among those not affiliated with a publisher and cautious, at best, from those affiliated with a publisher. The questions, concerns, and issues I see raised generally deal with how journals can remain profitable for the value they add and how researchers can receive due credit for their efforts.

But let’s set aside the questions of whether scientists and researchers should be required to share their data and articles or even if it’s a good idea that they do it. I think an even larger issue here is whether or not our current crop of scientists and researchers has the data management skills necessary to make the research data usable to anyone but themselves and their immediate research group. Data management practices of researchers are not exactly stellar. Infrequent or nonexistent backups, inadequate metadata on variables and research background, and loose standards all contribute to a set of data that is basically useless to anyone not involved with the project from the beginning.

Do you think that the data generators know how to manage their data properly? What can be done to improve the situation? How can librarians help?

16 thoughts on “Data Sharing: Panacea or Can of Worms?”

A timely topic Chris. I’ll add also that many funding agencies like the National Science Foundation are beginning to require data management plans as part of the grant submission process, making this an issue on the forefront for many scientists and social scientists who have never thought beyond an external hard drive.

From what I’ve seen, researchers often have an idea that they should manage their data for preservation, and are perhaps interested in doing so. The issue is they don’t see the library as the place where that can/should happen. The perception of the library is growing on many campuses (see University of Michigan folding the Press into the library or Purdue’s work in data curation profiles), but there’s still work to be done. I’m fortunate to work with a great e-science librarian here at FSU, and together we are slowly working to change the culture.

To answer your questions – librarians can help by getting the facts right (and having a practiced pitch down to a “t”), advocating for what the new role of the library may be on campus, and working to build partnerships and relationships with Offices of Research, Provosts and Research Institutes and Centers. At least, that’s what I think should be done. Once the campus understands the library as having interests beyond giving books and coffee to undergrads, the collaborative work will come.

Yes, Micah, the data management plans are making scientists and researchers think more about the data management than they ever have before. I would be interested in striking up a conversation with the e-science librarian you mentioned if you would send me (via private email) his or her email address. I’d like to know the kinds of things FSU is doing in this area. Thanks!

As with Open Access, librarians need to do a better job of leading by example here; much of our own research doesn’t come with related data sets. For example (and I *do not* mean to pick on this article, it’s great research!) I recently read this College & Research Libraries article: http://crl.acrl.org/content/73/1/33.full.pdf+html

Early on, it provides a breakdown of responses by institution type. Later, the actual research data is presented, but not cross-referenced by institution. As a community college librarian, I have a specific interest in how their results apply to my type of library. Their data clearly is capable of communicating this; their article does not. Again, that’s not a weakness of the article (no one would want to read an article that goes in-depth about every possible pivot table you can create from the data), it’s due to the lack of published data.

We need a centralized LIS repository much like Data.gov (which, by the way, open-sourced its infrastructure: https://github.com/opengovplatform/opengovplatform) so research can be repurposed. On top of that, standards and training in data management would be wonderful. Personally, I wish every data set was available as a version-controlled database, not just a solitary Excel spreadsheet.

If I may jump in and respond to The LIS Queen – the science in Library Science relates to the history of the field as a social science, which is a strange and interesting history. (Google “Wayne Wiegand”).

The e-science thing is a little different – its referring to traditional science (Physics, Chemistry, Biology, etc.) starting to interact more with online, digital, big data, technology infrastructure etc. E-Science librarians explore that kind of stuff alongside faculty in the sciences.

Great post! Having the opportunity to internship this semester at a digital repository, I have the following to add

1. Researchers need to know their rights! Copy rights that is, from my experience it seems that many researchers aren’t aware of their publishing rights when signing their contract with journal publishers. This is where librarians can be instrumental, as Micah had mentioned librarians can help get the word out about sharing their data and articles “by getting the facts right (and having a practiced pitch down to a “t”)” There’s a wealth of information online available to researchers such as already mentioned by Micah, SPARC (http://www.arl.org/sparc/author/index.shtml) is one of them, their site offers a wealth of information to researchers, it even has an addendum that researchers can print out and add to their publishing contracts. Letting researchers know that by sharing their data, findings, articles, etc. this is giving them control in how their information is made available and shared.

2. Outreach: In order to cultivate the idea of Open Access, librarians and supporting staff, should commit heavily to outreach, this might include attending departmental meetings, to inform faculty about article and data sharing, as well as holding demonstrations with graduate students in order to cultivate open access into the faculty of tomorrow.

I am interested in e-science, not just sharing and open access, but also how scientists can benefit from IT and how IT changes the way research works in the academic or business environment.

For example, I was reading Kent Anderson on the topic, who said, “The most important trend for scholarly publishers is the integration of information into displays utilized at a point much closer to where the action is — in medicine, it’s the bedside or ward; in science, the lab or bench; in education, the classroom or virtual classroom.” I am interested in how IT can create computational environments that let researchers work together on the Web.

I am very interested in working in this area and would love to hear from you–I intend to graduate with an MLS from Simmons in May and will be looking for a job in or around Boston after that.

Did you take any classes like digital libraries, databases, digital preservation, or digital humanities? Since you’re graduating soon, you won’t have the opportunity to take any more classes, but you can read up on alot of stuff. I recommend the Digital Curation Centre from the UK (http://www.dcc.ac.uk/) and the e-Science Portal (http://esciencelibrary.umassmed.edu/). You can find a wealth of knowledge on those two sites. If I can be of more help, let me know. Good luck!

Speaking as a former scientist and current LIS student, I don’t think that most scientists are ready to share their data because they don’t have the tools and skills to organize and manage their data in the first place. The problem is that these skills are not being taught to science students and, more importantly, science grad students (who are the ones going into the world to run research groups). I personally think that this is a key area where we should be focusing (information literacy, anyone?) because it’s much easier to share and preserve data that is already well-organized and annotated.