Access to Raw Behavioral Data From Preclinical Research Papers

Editor’s note: Rosemary Morland completed postdoctoral work with Andrew Rice at Imperial College London, UK, and is now a freelance medical writer and editor. Morland submitted the following comment describing efforts to make scientific publishing more open, using a recent article she co-authored with Rice and colleagues as an example of how this could benefit the pain field. We invite readers to submit their own comments below on this important sea change in thinking about the sharing of scientific data.

There is a revolution happening in scientific publishing, spurred by the ability to include full sets of raw data alongside original research articles. This level of data transparency is changing the way we both view and use publications. Articles are no longer static, but can be updated as new findings become available, and can also act as data sources in their own right, not just in terms of systematic review and meta-analysis, but by providing the complete, raw data sets for subsequent re-analysis by other researchers who may discover novel findings, for instance, by applying different analysis paradigms that may not have been available to the original authors at the time of publication.

Clinical trials and genetics studies already require aspects of original data to be stored in an accessible repository. In clinical research, this was driven by a need for accountability, acknowledging the importance of the results by making them accessible to all, whereas in genetics, in common with other “big data” disciplines such as physics, the sheer volume of data generated and collaborative nature of the field have led to a heightened awareness of the benefits of data sharing. The breadth of a typical genomewide association study (GWAS) illustrates how the data could be used to answer questions other than the one originally posed, particularly as only a small portion of the entire data set may be examined in the initial analysis.

Since the 1970s, technological advances have been used to further preclinical research. From high-throughput molecular techniques to something as comparatively simple as the recording of behavioral videos, researchers are now able to do their analyses more quickly and thoroughly than ever before. The key benefit of these techniques is that the data they generate, be they numerical, video, or image, can be captured to facilitate analysis at a later date. Advances in digital technology and electronic publishing have now made it possible to provide true raw data (as opposed to spreadsheets of data) in some areas, particularly where the original data were collected using digital technologies. Digital visual recordings of rodent behaviors are an example of data amenable to open access.

In what may be a first for in vivo preclinical pain research, a group from Imperial College London has recently partnered with the online publisher F1000Research to make all the original video files captured in an experiment available for retrospective scrutiny and re-analysis (Morland et al., 2015). This study examined the behavior of rats subjected to visceral inflammation in the open field paradigm. The primary outcome measure was the ethologically relevant predator avoidance behavior of thigmotaxis (a preference for movement along the sides of an arena), and the authors also considered locomotion and rearing as secondary outcomes. The entire 15-minute video for each animal is included in the experiment. In order to maintain analyzer blinding, the video files are accessible in a masked format. The masking codes have been deposited with F1000Research and will be made available to researchers once they are ready to complete their statistical analysis.

F1000Research has indicated its intention to offer those completing new analyses the opportunity to publish their results and a short discussion as an “add-on” to the original paper. With this opportunity, there are a number of different approaches that could be taken—behavior could be examined along the time course, examining whether differences between groups are more prominent at the beginning, middle, or end of the trial; the authors used EthoVision XT video capture software, using a single point of tracking, but other methods (e.g., three-point or whole body) are also available and could highlight subtle differences; and they did not analyze the videos, for example, for grooming, facial grimacing, or sniffing behaviors—other aspects that could be investigated. Furthermore, it is likely that novel analysis paradigms will become available, and the videos can be used for these purposes.

There is understandable reluctance to share data in some quarters—fear of being "scooped," of being targeted by anti-vivisection lobbyists and other organizations that disagree with your work; and, of course, anxiety over your hard-won data being trawled through with a fine-tooth comb by your competitors and critics. But these are not reasons to keep data locked away. The problem of publication bias, whereby "positive" (or hypothesis-confirming) results are preferentially published over so-called negative data, is very real. It threatens to undermine scientific integrity, to say nothing of the time and expense wasted repeating studies that have already been conducted but which remain locked away in the laboratory drawer because they were not interesting enough.

Sharing data, in whatever form, can only be a good thing. It can be used to inform study design and direction, develop new analysis techniques, and provide a training ground for those new to the field. Re-analysis of published data can also yield new results, and allows studies to be revisited many years after they were performed and re-examined in light of current technology and knowledge.

A further benefit is enhanced animal welfare and application of the three R's—reduction, refinement, and replacement—of animals in research. It is the responsibility of all those who work with animals to ensure that each study they conduct is not only necessary but relevant, and the use of pre-existing data can help. For example, researchers may hypothesize a novel analysis paradigm; it is more ethical to test a new analysis method on existing videos from similar studies than perform a new set of studies using a new batch of animals, identical in all but analysis method.

A number of state-of-the-art journals that already promote or require inclusion of data, such as F1000Research, encourage those who conduct re-analysis to publish their findings as an add-on, associated with the original article but credited to the individual or group who conducted the secondary analysis. Working in this way also has the potential to facilitate collaboration—seeing how others interpret your data can give a fresh perspective to your work and open up new avenues of investigation, as well as putting you in contact with others in your field with a similar interest, which potentially can generate new and unexpected collaborations.

This is an exciting time for science. There are doubtless hurdles to overcome, particularly in terms of sharing and accessing large data sets; however, the main one is of attitude—the desire to be transparent and open with research can broaden horizons and accelerate the advancement of knowledge. After all, breakthroughs can often be serendipitous, so who knows what discoveries could be made if all the information was available?

Comments

This is an important topic, and one in which medical sciences lag far behind other fields. I had not directly encountered the issue until we recently were asked by PLoS One to make the data for a study available (or explain why we weren't making it available). After looking into the matter further, I firmly believe that all data should be made available as part of the publication process. But making data available is pointless if the neccessary supporting documentation is not supplied. For instance, the material made available should include: 1) the raw data, 2) the tidy dataset (post-cleaning), 3) the codebook, and 4) the data analysis scripts (which should include how the data was cleaned). The data provided should allow others to reproduce the authors' analyses, and extend the analyses if appropriate.

There are numerous online resources available to host the material. I use the GitHub.com platform, which is used mainly by programmers and data scientists. It's free, it is designed to facilitate collaboration and sharing, it allows version control, licenses can be assigned, private repositories are available for while you are working on the data, it can generate simple webpages for each project, and you can also assign DOI numbers through other free services such as Zenodo.com. As examples, I have provided links to my GitHub account, and the simple website for the study I mentioned earlier: https://github.com/kamermanpr; http://kamermanpr.github.io/Amitriptyline.HIVSN/

There also are free and easy to learn tools to make supplementray data more dynamic and accessible to other scientists and the public (e.g., plotly.com, Google Charts, Shiny for R).

Data drives innovation, and I believe that progress in the field would be greatly enhanced if we were afforded the opportunity to directly interact with others' data, and combine and directly compare it to our data or data from other sources.