You are here

Next-Generation Ecologists and Open Data

published by scottchamberlain on Tue, 08/30/2011 - 03:21Citation for this post: BibTeX | RIS

The next generation of ecologists have the opportunity to affect a shift in the culture of ecology. The shift that we need is to make ecology open. Why? In short, there are significant benefits to open science seen in other fields - suggesting that ecology can benefit from the shift.

We live in an increasingly connected world, which means that ecologists can make this shift to open ecology relatively rapid, especially through use of social media such as Facebook and Twitter. Below I lay out what I think are advantages of open ecology, some challenges to open ecology, some solutions, and some promising trends.

Why should ecologists share data?

Here are a few reasons to share data: 1) sharing data should increase the pace of ecological discovery; and 2) sharing data may increase citations of your own research (Piwowar et al. 2007).

Open data: challenges and solutions

"Although it is challenging to develop new technological solutions to data sharing in ecology, the social and cultural barriers may be even more onerous." - Reichman et al. 2011, Science

Social and cultural challenges

Tradition! Ecologists are not used to sharing data. The predominant method to retrieve original data likely involves emailing the author. Should you personally request data from an ecologist, you are not likely to get the data! A recent study (Savage & Vickers 2009) asked 10 scientists that published in PLoS Journals for their raw data, citing PLoS's explicit requirement that authors share data, and only one of 10 authors shared their data - 10%! This is admittedly a very small sample size, but if it is at all representative of scientists, lack of willingness to share data may be the biggest challenge to open data.

Solution:

Easy - wait for the pool of ecologists to turnover to a cohort that are more open to data sharing. Just kidding, sort of. Large systemic changes probably take time. US NSF is at least requiring that grant recipients have a data management plan (see here), and projects are appearing to create data management plans easily (e.g., DMP Tool), but unfortunately NSF does not require grant recipients to make data open source.

Ecological data can be cheap, but time consuming, to collect. Given how cheap ecology data can be to collect (admittedly, costs can be great if studies occur across sites, countries, etc.), many ecologists work alone or just with their graduate students or a few colleagues. This leads to the data dispersion problem discussed below, and makes data acquisition difficult.

Although sometimes cheap, collection of ecology data can be time consuming. This is partly a reflection of the nature of ecology groups in which only a small number of people are collecting a data set, leading to more time needed to collect data than if large teams were involved. In addition, ecologists strive to gain generality, and uncover temporal trends, if any. Thus, we like to collect data over many generations/years of our study organism/location. Given the sometimes long periods of time data collection requires, ecologists may be understandably protective of their efforts.

Solutions:

Data cost: It is great that ecology data is relatively cheap to collect, and is not likely to change, at least with field ecology data. The solution must be to make data sharing endemic in ecologists.

Time: An increase in funding for ecology would decrease time of data collection within seasons, but the likelihood of increased funding is not high given the state of the global economy. However, we still strive to collect data over many years to seek generality, etc., so this remains a constraint on open data in ecology.

Lack of a reward system for data sharing. Ecologists generally can not get credit if they share their data sets.

Solution:

One promising example is that Dryad provides a citation for a deposited data set on their system. Do ecologists use these in their CV's? Do hiring/tenure committees value these data sets?

Another interesting solution for rewards was brought up by Kueffer et al. (in press) in a recent Trends in Ecology and Evolution commentary. They suggested that when meta-analyses are published, the papers used in the analyses are often not indexed by ISI and other indexing services. If journals allow more generous space for citations, the papers included in the meta-analysis can be inserted into the references section - leading to their indexing in ISI, counting towards citations that would be noticed by hiring/tenure committees.

Technological challenges

Data dispersion. Ecology data is dispersed among thousands of researches around the world, making it hard to get to this data. However, efforts such as Dryad for data sets and TreeBASE for phylogenetic trees are making the data dispersion problem less of a stumbling block.

Solution:

Many web services serve as repositories for different types of data (e.g., phylogenetic trees: TreeBASE; specimens: Global Biodiversity Information Facility), but these are still fragmented sources and do not "talk" to each other. DataONE is a recent initiative that seeks to link data sets, and should go a long way towards solving data dispersion (given ecologists can come around to allowing access to their data). Also, see the Ecological Data project for a way to find ecological data sets, and get them on your machine.

Data heterogeneity. Ecology data does not share an alphabet (e.g., DNA, RNA, amino acids), making it logistically difficult to make data easily available in one location, and in easily available formats. However, we can overcome data heterogeneity in ecology by clever use of metadata (think KNB’s Morpho software).

Solution:

This impediment to open data is a hard one to fight because ecological data will always be really diverse, making a single coherent data source such as GenBank probably impossible. However, some projects like Dryad are archiving diverse data formats. Creating really good metadata to go along with datasets is really important, and can make ecological data so much easier to use. You can use tools like Morpho software to help you create metadata.

Data acquisition. Ecological data is not very open, and is hard to access. In fact, Reichman and colleagues estimated that only 1% of ecological data collected is accessible after publication (Reichman et al. 2011)! For the ecological data that is accessible, there aren’t great tools for data exploration (but see below for some promising examples). The lack of a unifying alphabet of ecological data resulted in each branch of the field developing its own indexing and archival practices.

Solution:

In addition to data repositories, there are a few efforts to make open source data easy to access for scientists. One awesome project is DataONE, which seeks to provide a single node which will connect to many data providing nodes, making it easier to find ecological data. Another project, rOpenSci, seeks to develop R-based tools for facilitating open science. R is an open-source statistical/modeling/data visualization environment. Using an open-source environment rOpenSci will integrate open-source data sets from across scientific disciplines, allowing reproducible research on open-source data, phylogenetic trees, citations, and even full-text literature through open-source journals such as PLoS Journals and BioMed Journals.

See Reichman et al. (2011) for more discussion on open data and open science.

Promising signs

Although sharing is may not be a habit for ecologists yet, we are making progress towards open data in ecology. Many projects come to mind:

In addition, a number of ecology and evolution journals require raw data to be archived in various locations (e.g., Dryad), including:

American Naturalist

Evolution

Evolutionary Applications

Heredity

Molecular Ecology

Molecular Phylogenetics & Evolution

PLoS Journals

Systematic Biology

and more...

The way forward

The next generation of ecologists need to pave the way for a new generation of ecology that is open and transparent, facilitating collaboration, and accelerating the speed of ecology (Piwowar et al. 2007, Costello 2009).

Open and transparent ecology will also make our science less open to criticism, especially the controversial fields of ecology such as climate change. See these stories (GreenHughes.com, British Ecological Society blog post) for examples of calls to increase openness in science as a way to increase public trust in science. Allowing any interested parties access to the same data we ecologists have can only increase trust among the public and policy makers.

The most immediate task we have to undertake is to make all of our data open source. Post your datasets on Dryad or KNB, your phylogenetic trees on Treebase, etc.

Finally, what do you think? What do you think are the major hurdles to open ecology? What are the solutions? Please let us know what you think with a comment below, on Twitter (@INNGEcologist), or on Facebook.