Research Data Management, journals and supplementary materials

My journeys amongst the sector’s RDM specialists continue with visits to Bristol (Zosia Beckles) and Leeds (Nick Sheppard, Rachel Proudfoot, Graham Blyth, Brenda Phillips and Gemma Hemming). I’m aware that I’ve very much spoken to staff at very research intensive universities so far, which has been great, but I’m keen to speak to all kinds of institutions so please do get in touch if you have experiences of journal or publisher research data policies you want to share – david.kernohan@jisc.ac.uk .

In an ideal world, Research Data Management should be proactive – researchers should be building in good data management practice as a central component of good research practice. At Bristol, for example, research teams are reguarly visited by RDM specialists to encourage and support their data management and publication planning as an ongoing part of project work. At Leeds, support is provided as early as possible during the research lifecycle. However, what constitutes good research practice is very variable from discipline to discipline.

For instance, in some engineering sub-disciplines, presentations at conferences are seen as far more important than journal publications – an artefact of the more commercial nature of some research – and conferences tend to have very poor guidelines around sharing data underpinning presented findings. Conversely in computer science code is shared freely and openly, with the github->zenodo route firmly established, but the language of “data” as a sharable artefact is not a widely recognised one.

These differences extend to the way in which data is represented within published articles. Some physicists, for instance, expect to see data availability statements within the “acknowledgement” section of a paper, other disciplines would expect these to be supplementary material – with some expecting the actual data itself to be published as supplementary material.

Supplementary Material and Research Data?

The existence of the “supplementary material” system feels like an precursor to full data sharing – materials here are often just posted online by the journal, and may encompass anything from images to graphs to data tables to (apparently) author lists.

Classically, these are materials that cannot (for reasons of space, likely interest or format) be included in the print version of a journal and is instead available online or (initially) via microfiche – supplementary material can generally not be referred to separately from the paper, and is unlikely to have a stable, long-term, identifier. Though such materials (which would very often constitute “research data” in the modern, wider, sense) may be essential to ascertain the validity of the conclusions of the paper, the publication method makes it very easy to “lose” supplementary material. It is neither entirely part of the paper nor entirely separate.

There are no hard and fast rules regarding what constitutes supplementary material (for publication alongside an article, hosted by a publisher) and what is research data (linked to from an article, hosted by a repository). Some journals (eg Cell) place a filesize or length limit on supplementary data files.

My initial reading around the issues presented by supplementary material yielded an excellent paper (Pop and Salzberg 2015) offering further critiques of the lack of consistency, transparency, discoverability and review status. Supplements to journal articles are an orthogonal issue to research data, and it seems clear to me that the two should be addressed in parallel.

There have been a number of previous attempts to address the issues posed by supplementary materials. For example in a 2011 report by the EU-funded ODE project, it was noted that “Few standards exist between journals on how to indicate the presence of supplementary files or where to find them. Only in few cases will the supplementary material be provided with a separate DOI or other persistent identifier to enable linking independent from the main article.” This concern is echoed by the 2013 NISO Recommended Practices for Online Supplemental Journal Article Materials, but it is notable that the latter does not draw links between RDM and supplementary material (though the recommendation that such materials should themselves have a DOI is welcome).

Institutional RDM teams don’t tend to know about supplementary data – and will not be on-hand to advise on preservation or licensing issues, or as to whether funder requirements are being met – which it is very unlikely for data published only as supplementary material. The more I talk to institutions, the more I learn about unexpected issues like this.