Data, Code, Science Communication

Month: March 2018

A few weeks ago I described the results of a project investigating the data management practices of neuroimaging researchers. The main goal of this work is to help inform efforts to address rigor and reproducibility in both the brain imaging (neuroimaging) and academic library communities. But, as we were developing our materials, a second goal emerged- practice what we preach and actually apply the open science methods and tools we in the library community have been recommending to researchers

Wait, what? Open science methods and tools

Before jumping into my experience of using open science tools as part of a project that involves investigating open sciences practices, it’s probably worth taking a step back and defining what the term actually means. It turns out this isn’t exactly easy. Even researchers working in the same field understand and apply open science in different ways. To make things simpler for ourselves when developing our materials, we used “open science” broadly to refer to the application of methods and tools that make the processes and products of the research enterprise available for examination, evaluation, use, and re-purposing by others. This definition doesn’t address the (admittedly fuzzy) distinctions between related movements such open access, open data, open peer review, and open source, but we couldn’t exactly tackle all of that in a 75 question survey.

From programming languages used for data analysis like Python and R to collaboration platforms like the Github and the Open Science Framework (OSF) to writing tools like LaTex and Zotero to data sharing tools like Dash, figshare, and Zenodo, there are A LOT of different methods and tools that fall under the category of open science. Some of them worked for our project, some of them didn’t.

Data Analysis Tools

As both an undergraduate and graduate student, all of my research methods and statistics courses involved analyzing data with SPSS. Even putting aside the considerable (and recurrent) cost of an SPSS licence, I wanted to go a different direction in order to get some first-hand experience with the breadth of analysis tools that have been developed and popularized over the last few years.

I thought about trying my hand at a Jupyter notebook, which would have allowed us to share all of our data and analyses in one go. However, I also didn’t want to delay things as I taught myself how to work within a new analysis environment. From there, I tried a few “SPSS-like” applications like PSPP and Jamovi and would recommend both to anyone who has a background like mine and isn’t quite ready to start writing code. I ultimately settled on JASP because, after taking a cursory look through our data using Excel (I know, I know), it was actually being used by the participants in our sample. It turns out that’s probably because it’s really intuitive and easy to use. Now that I’m not in the middle of analyzing data, I’m going to spend some time learning other tools. But, while I do that, I’m going to to keep using and recommending JASP.

From the very beginning, we planned on making our data open. Though I wasn’t necessarily thinking about it at the time, this turned out to be another good reason to try something other than SPSS. Though there are workarounds, .sav is not exactly an open file format. But our plan to make the data open not only affected the choice of analysis tools, it also affected how I felt while running the various statistical tests. One one hand, knowing that other researchers would be able to dive deep into our data amplified my normal anxiety about checking and re-checking (and statcheck-ing) the analyses. On the other hand, it also greatly reduced my anxiety about inadvertently relegating an interesting finding to the proverbial file-drawer.

Collaboration Tools

When we first started, it seemed sensible to create a repository on the Open Science Framework in order to keep our various files and tools organized. However, since our collaboration is between just two people and there really aren’t that many files and tools involved, it became easier to just use services that were already incorporated in our day-to-day work- namely e-mail, Skype, Google Drive, and Box. Though I see how it could be potentially useful for a project with more moving parts, for our purposes it mostly just added an unnecessary extra step.

Writing Tools

This is where I restrain myself from complaining too much about LaTeX. Personally, I find it a less than awesome platform for doing any kind of collaborative writing. Since we weren’t writing chunks of code, I also couldn’t find an excuse to write the paper in R Markdown. Almost all of the collaborative writing I’ve done since graduate school has been in Google docs and this project was no exception. It’s not exactly the best when it comes to formatting text or integrating with tables and figures, I haven’t found a better tool for working on a text with other people.

We used a Mendeley folder to share papers and keep our citations organized. Zotero has the same functionality, but I personally find Mendeley slightly easier to use. In retrospect, we could also have used something like the F1000 Workspace that has a more direct integration with Google docs.

This project is actually the first time I’ve published on a preprint. Like making our data open, this was the plan all along. The formatting was done in Overleaf, mostly because it was a (relatively) user friendly way to use LaTeX and I was worried our tables and figures would break the various MS Word bioRxiv templates that are floating around. Similar making our data open, planning to publish a preprint had a impact on the writing process. I’ve since notices a typo or two, but knowing that people would be reading our preprint only days after its submission made me especially anxious to check the spelling, grammar, and the general flow of our paper. On the other hand, it was a relief to know that the community would be able to read the results of a project that started at the very beginning of my postdoc before it’s conclusion.

Data Sharing Tools

Our survey and data are both available via figshare. More specifically, we submitted our materials to Kilthub, Carnegie Mellon’s instance of figshare for institutions. For those of you out there currently raising an eyebrow, we didn’t submit to Dash, UC3’s data publication platform, because of an agreement outlined when we were going through the IRB process. Overall, the submission was relatively straightforward, through the curation process definitely made me consider how difficult it is to balance adding proper metadata and documentation to a project with the desire (or need) to just get material out there quickly.

A few more thoughts on working openly

More than once over the course of this project I joked to myself, my collaborator, or really to anyone that would listen that “This would probably be easier or quicker if we could just do it the old way.”. However, now that we’re at a point where we’ve submitted our paper (to an open access journal, of course), it’s been useful to look back on what it has been like to use these different open science methods and tools. My main takeaways are that there are a lot of ways to work openly and that what works for one researcher may not necessarily work for another. Most of the work I’ve done as a postdoc has been about meeting researchers where they are and this process has reinforced my desire to do so when talking about open science, even when the researcher in question is myself.

Like our study participants, who largely reported that their data management practices are motived and limited by immediate practical concerns, a lot of our decisions about open which open science methods and tools to apply were heavily influenced by the need to keep our project moving forward. As much as I may have wanted to, I couldn’t pause everything to completely change how I analyze data or write papers. We committed ourselves to working openly, but we also wanted to make sure we had something to show for ourselves.