22 January 2016

How I’ve parasitised research

The aerial view of the concept of data sharing is beautiful. ... However... There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”

Oh boy.

As a biologist, I know that parasitism is one of the most successful strategies for living in the world, and play an integral role in ecosystems. But I’m no so much a biologist that I fail to recognize that those who re-use other people’s data – like me – are being disparaged.

Following Longo and Drazen, I guess creating GenBank was totally the wrong thing to do.

I became convinced of the usefulness of data sharing when I started collaborating with Paty Feria, modelling the distribution of crayfish species. About the same time, I was starting to work on the ecology of sand crabs. Both projects required using other people’s published geographic data. I spent a long time pulling out distribution records from published papers.

Without that geographic data, we couldn’t have created the new predictive models for distribution (Feria and Faulkes 2011, Faulkes et al. 2012). Those models were considered in this risk assessment for marbled crayfish, which demonstrates that, at some level, people found those new analyses useful.

While not critical in analyses, geographic data was critical in creating maps that allowed my to show the context of a range extension (Faulkes 2014). I couldn’t really prove it was an extension without that.

Because of my experiences in creating those papers, I’ve put in effort into archiving my own data, usually on Figshare. My record isn’t perfect, but I hope it might be useful to someone else.

There are a few (very few) defenses of the Longo and Drazen piece. First, they are trying to show an example of good collaboration, where everyone was happy. That could be useful, if they had stripped out the potshots about “parasites.”

Second, they are talking about medical research, where patient consent and privacy are ongoing, real concerns that shouldn’t be swept under the table. Remember issues around sequencing the DNA of HeLa cells, and people then going, “Hey, the woman those cells came from still has immediate family, and posting those cell DNA sequences could violate their medical privacy.”

But Longo and Drazen don’t frame it that way. Instead, they frame the problem as one in which researchers could suffer embarrassment or career impediments because of someone else used their data.

The first concern is that someone not involved in the generation and collection of the data may not understand the choices made in defining the parameters.

Someone might misunderstand what I was doing and I could be embarrassed.

(S)tealing from the research productivity planned by the data gatherers...

Someone could publish before me.

(E)ven use the data to try to disprove what the original investigators had posited.

Someone might show I was wrong and I could be embarrassed.

I understand wanting to protect your reputation and advance your career. But if your reputation and career can’t stand up to someone else using you’re data, it’s not a very strong career to start with.

25 January 2016: Co-author Drazen wouldn’t comment on the use of the word “parasite” when asked about it by a journalist. But Drazen has penned a response.