LinkedEarth 2020

Despite EarthCube support for the project having ended, LinkedEarth is alive and kicking, and actually had a pretty peachy year, thank you for asking! That was in no small part thanks to the tireless Deborah Khider.

Now, impressive though it was to get over 100 scientists to organize and come to an agreement about anything, we’re not out of the woods yet. The current version of the standard, PaCTS 1.0, is aspirational at best: even on a dataset that she had generated herself, Deborah was unable to adhere to the strict reporting guidelines laid out by this initial vote, with completeness ranging between 20 and 80% in some categories. The reason is that when we asked our colleagues what (meta)data they considered “essential” to re-use a dataset, many interpreted this to mean “very important”, or “I’d like to have this”, not “I cannot use this dataset in any way unless I have this information”, which is what we had intended. The fault is ours: asking questions that are not precise enough, one rarely gets the most helpful answers.

The other issue is adoption, which we discuss in our paper:

There are essentially two levers to activate. The first is funding agencies. In the United States, for instance, the National Science Foundation funds the vast majority of paleoclimate research. While the agency now requires a data management plan to be submitted for each proposal, its reporting guidelines are very broad. They could be made more specific and point paleoclimate researchers to the latest version of PaCTS. The European Research Council similarly supports Open Science, but with far less specific guidelines than PaCTS v1.0. […] We therefore call on funding agencies to either endorse this standard or propose a meaningful alternative.

The second lever is publishers and editors: while each publishing house encourages digital data archiving to varying degrees, the decision of what (meta)data to include is ultimately up to the author and often fails to consider the long‐term value proposition of the data set. Publishers could help ensure that the present standard is, at the very least, encouraged […] In particular, the American Geophysical Union and Copernicus publishers recently endorsed requirements to make data FAIR. Affiliated journals could use their leverage to promote more stringent reporting standards.

Again showing exemplary leadership, the editorial board of Paleoclimatology & Paleoceanography asked Deborah, Nick and I to present PaCTS at their annual AGU meeting, and to discuss possible avenues for adoption by the journal, as well as greater community involvement. The incoming editor-in-chief, Matthew Huber, expressed interest in more focused articles on archive-specific versions of the new PaCTS. More details to come.

We now describe our vision for how this might take place, and hope to get you (yes, YOU!) involved.

Looking to the future

Moving forward, there are three directions in which LinkedEarth intends to grow this year — and we’d love for you to be a part of it: (1) a more workable version of PaCTS, representative of broader engagement; (2) closer integration of data standards with science-enabling code; (3) collaborations on grand challenges. Hear us out.

We convened an informal “Town Hall” meeting at AGU to touch base with some of the PaCTS paper co-authors, or other community members interested in joining the effort (see picture). Three action items stood out:

The need to re-invigorate archive-specific working groups(e.g. trees, marine sediments, MARPA, etc). While the LinkedEarth wiki provided a platform for tracking discussion and votes, it was not the preferred mode of communication for many scientists, and we decided that it is far more beneficial to let individual groups self-organize and pick the communication tools that best serve them, as long as they come up with a written document summarizing the standard (drawing inspiration from the PaCTS paper if necessary).

The need to distinguish 5 categories: minimal, essential, recommended, desired and superfluous. “minimal” will espouse the definition of the schema.org initiative, and we will emphasize that “essential” is not “desired”.

The need for each working group to come up with one exemplar dataset per archive. Stored in the LiPD format, such a dataset will showcase what a maximally complete dataset embodying best practices would look like, serving as a goalpost for others.

The need to integrate PaCTS into the creation of Linked Paleo Data (LiPD) files. As we have described elsewhere, LiPD is the emerging standard container for paleo(climate) data, and we will work in the coming year to integrate PaCTS v1.0 into the LiPD playground, which provides a web interface and quite a bit of hand-holding to create LiPD files.

Science-Enabling Codes

At LinkedEarth, we’ve never thought of data in isolation. True, in a data-driven science like ours, data are paramount. But on their own, data don’t do much. LinkedEarth is all about squeezing as much information as possible from existing data, so we have, since the very beginning, always had our eye on what EarthCube calls “science-enabling” capabilities. To us, that means building user-friendly, open source codes that can ingest LiPD-formatted data and facilitate routine and advanced tasks:

Pyleoclim, the python arm of our operations, is already in beta, and some of it was used to great effect in this PNAS paper. Deborah also previously shared how its methods could be used to test the link (or lack thereof?) between solar activity and climate. Pyleoclim v1.0 will be released by the end of March and include basic timeseries analysis including spectral and wavelet analysis with missing and time-uncertain data, correlation and causality analysis, mapping and plotting, and a query function to directly link to the LinkedEarth wiki.

GeoChronR, its R evil twin, has been available on github for a while, and will be released as a proper R package in the coming months. This spring we are tidying up the GeoChronR package, simplifying installation, and finalizing a manuscript describing the core functionality of the package. This includes creating or loading age ensembles, age-uncertain correlation, regression (calibration), EOF and spectral analysis, along with publication-quality plotting and visualization functions.

There is substantial overlap between the two packages, giving our users a plethora of choices, and harnessing the most powerful capabilities of both languages. We hope that this will convince you to invest the time to put your data into LiPD (observing PaCTS guidelines, if that makes sense to you), because doing so will give you instant access to a smogarsbord of analytical tools.

Collaborations

Tools, however, are only as interesting as the uses to which they put. This Spring, we will respond to the latest EarthCube solicitation with a collaborative project, where we will put these tools at the service of some big paleoclimate questions. If you are game and want to play with us, let’s talk! You know where to find us.