Paleoecology Blogs

EarthCube webinars and the challenges of cross-disciplinary Big Data.

EarthCube is a moon shot. It’s an effort to bring communities broadly supported through the NSF Geosciences Directorate and the Division of Advanced Cyberinfrastructure together to create a framework that will allow us to understand our planet (and solar system) in space and in time using data and models generated across a spectrum of disciplines, and spanning scales of space and time. A lofty goal, and a particularly complex one given the fragmentation of many disciplines, and the breadth of researchers who might be interested in participating in the overall project.

To help support and foster a sense of community around EarthCube the directorate has been sponsoring a series of webinars as part of a Research Coordination Network called “Collaboration and Cyberinfrastructure for Paleogeosciences“, or, more simply C4P. These webinars have been held every other Tuesday from 4 – 5pm Eastern, but are archived on the webinar website (here).

The Neotoma Paleoecological Database was featured as part of the first webinar. Anders Noren talked about the cyber infrastructure required to support LacCore‘s operations, and Shanan Peters talks about an incredible text mining initiative (GeoDeepDive) in one of the later webinars.

Fig 1. The flagship for the Society for American Pedologists has run aground and it is now a sitting duck for the battle machine controlled by the Canadian Association for Palynologists in the third war of Data Semantics.

It’s been interesting to watch these talks and think about both how unique each of these paleo-cyberinfrastructure projects is, but also how much overlap there is in data structure, use, and tool development. Much of the struggle for EarthCube is going to be developing a data interoperability structure and acceptable standards across disciplines. In continuing to develop the neotoma package for R I’ve been struggling to understand how to make the data objects we pull from the Neotoma API interact well with standard R functions, and existing R packages for paleoecological data. One of the key questions is how far do we go in developing our own tools before that tool development creates a closed ecosystem that cuts off outside development? If I’m struggling with this question in one tiny disciplinary nook, imagine the struggle that is going to occur when geophysicists and paleobotanists get together with geochonologists and pedologists!

Interoperability of these databases needs to be a key goal. Imagine the possibilities if we could link modern biodiversity databases with Pleistocene databases such as Neotoma, and then to deep time databases like the Paleobiology Database in a seamless manner. Big data has clearly arrived in some disciplines, but the challenges of creating big data across disciplines is just starting.