AERA-NSF Workshop: Data Sharing and Research Transparency (Part I)

2017-07-25

It was an honor to be invited to attend the AERA-NSF Workshop on Data Sharing and Research Transparency at the Article Publishing Stage in Washington, D.C. during July 25‐27, 2017. I am attending as an early career scholar from the International Society of the Learning Sciences — my academic home.

It’s only a half-day into the workshop but I’m already amazed by so many great thoughts from a full room of journal editors, program officers from funding agencies, and early career peers. Below I’m sharing some general (not necessarily representative) themes of ideas discussed so far at the workshop.

Data Sharing: Where Are We?

In preparation for the conference, Dr. Felice J. Levine, the workshop convener from AERA, shared out the classic National Academies Report on Sharing Research Data that was released in 1985. I read it on my flight into D.C. and it was a worth read. What stroke me was: many thoughts I compiled in preparation of the workshop were already elaborated in this 32-year-old report. The value of data sharing is not newly recognized. Many challenges have persisted.

Before reading this report, I happened to read a much newer article titled Administrative social science data: The challenge of reproducible research published in Big Data & Society — a journal I reviewed for. There are some clear advancements the social science communities (and scholarly communities in general) have made since 1985. For instance, we have various research tools and platforms available these days to facilitate data management, sharing, and publishing. Git — a version control system highly recommended by this article — was nonexistent when the National Academies Report came out; neither were platforms and initiatives such as Open Science Framework, Harvard Dataverse Network, and Figshare. However, when juxtaposing challenges discussed in both pieces, what stroke me — again — was how slow it has been to shift academic cultures to promote data sharing. Indeed, developing tools are easier, whereas changing cultures at many levels — e.g., in research labs, departments and colleges, institutions, associations, funding agencies — are much much more difficult.

Discussion Themes

Perspectives from these two pieces were echoed in a round of self-intros at the workshop this afternoon. Below are some keywords I noted, which again, reflect my own bias.

Data. One important point was made about what we mean by data. This is not only about the traditional dichotomy of qualitative vs. quantitative data, but also — purposes of data, power relations in data and data systems (e.g., whom are served by data), shapes of data, etc. I recently wrote in an invited think piece in Chinese about how the term data (数据) originally meant punishing people in ancient Chinese. In the same vein, one colleague commented on the word research being dirty to Indigenous peoples. These views apply here in educational research data as well and therefore in data sharing.

Contexts of research data are also extremely important. Data shared without sufficient information about their context are useless. This fact places burden on both initial investigators who share data (because they need to explain the context in prep for data sharing) and secondary analysts who need to understand the context (often based on insufficient documentation). More work is probably needed to further unpack data, and to articulate data that are shareable, situations where data are meaningful to be shared, and constituents of shared data.

Culture and practices. I believe this topic was mostly discussed so far. Subtopics include incentives and disincentives related to data sharing, which are explored in depth in the National Academies Report. For an assistant professor like me, for example, the current tenure process does not credit my investment in data sharing or data publishing. Even if I publish a dataset with a DOI attached to it, it still doesn’t carry the same weight as an empirical piece in a top journal. An editor mentioned it’s also considered “suboptimal” for someone’s tenure case to publish a study based on secondary data. While there is currently less credit given to data sharing practices, disincentives abound, which are also elaborated in the National Academies Report. Therefore, data sharing does not only need a few mavericks but systematic investments to incentify and sustain culture changes.

Ethics. Another major concern was ethical challenges related to data sharing. It was refreshing to hear from one senior colleague sharing that patients whose data were collected for medical research were eager to have their data shared, because they wished to contribute to the quest for a cure, while laws and ethics guidelines prohibit sharing of personal health data. For me as an early career scholar, I tend to make “the minimal ask” when preparing my IRB protocols — only because the IRB work with an IRB office and a school district is already so daunting for someone who’s still navigating the systems. I worry that being ambitious in data sharing would undermine the likelihood of getting research protocols approved. In addition to putting a sentence to a journal’s homepage saying “you should share your data”, we need additional resources to help scholars deal with ethical challenges even before a study is conducted.

Infrastructure. Numerous initiatives have emerged to support data sharing, and open science in general. I look forward to experts who are in the forefront of these matters to share their work. For example, colleagues from the Center for Open Science, which has developed systems like the Open Science Framework and OSF Preprints, will present tomorrow. As an advocate of open science, I am also intrigued by recent work by writing & preprint platforms like Authorea and code sharing platforms like Code Ocean which do not only allow sharing of Research Objects but more interactive forms of scholarly discourse & content consumption (e.g., interactive visualizations linked with shared data and directly embedded in publications). So, technically we are (almost) there.

Value. At the dinner table, it was refreshing to hear from senior scholars how they had to mail out their manuscripts when submitting to a journal and how they needed to fax their reviews to a journal/conference. At that time, the manuscript was almost the only thing others can see from someone’s work. We talked about how we can now assign value to so many “middle-products” or “byproducts” of academic research in addition to an article as an end-product. This is linked to a point made in my last blog post about the need to expand valuation in education to include artifacts generated in learning processes. After reading emerging, draft guidelines that encourage and, in some cases, enforce citations of shared data, I’m increasingly optimistic about new cultures of research and publishing that are yet to come.