Bridging across bridges: engaging the geoscience research community in standards development (Part 2)

This is Part 2 of a multipart series. Click here for Part
1 or Part 3.

First, I’d like to mention some comments to Part 1, in which I posed a question: “A small but committed number of academic researchers are helping develop OGC standards, but the vast majority are not. Why
do some get into it, and why don't more?”

One strong point from comments (some off-blog) was that
domain scientists (i.e., non-computer scientists) should not have to engage in
information technology (IT) or data management issues, much less in data
standards development. Rather, they should be involved in an advisory capacity,
leaving the informatics to informaticians. I completely agree. Nevertheless, I’ve come across a few rare
domain scientists who stand in both the science and IT worlds, and do it well
enough that they make things happen on a global scale. These are outliers, and should not be taken as
“typical scientists”. Further, this should not be taken as a criticism of less
eclectic scientists. But I’m curious if there are ways to nudge the system, that
would create more opportunities and rewards for domain scientists to work with IT and standards folks.

The thing is, “people
tend to do what they really want to do”, as a wise supervisor once told me,
when I was trying to explain why I wasn’t getting done the things that were his
priorities. He also recognized that he
got the best work from employees who were tasked to do what they really wanted
to do. I completely agree with this philosophy,
so I’m not advocating that domain scientists try becoming good at
something they don’t want to do. There’s really an ecosystem of science and
technology tasks and people, and we depend on different people wanting to do
different tasks.

What I do want to see is that good data management and IT practices become easier and more natural
for geoscientists to follow, in fact making it easier to focus on their science,
without having to focus so much on the technology. I want to look at ways to improve the technology of science without distracting the scientists with technology. Then scientists and researchers will get to do more of what they really want to do.

Examples of where this is starting to happen are the integrated
tools/data sets used by the climate/Met research community. Data comes in netCDF format, is processed using CDO or similar,
visualized using Ferret or
other tools. Same for the Esri and HydroDesktop environments used by
many environmental science researchers. Such environments handle the
standards-based side of things, allowing users to conduct data search and analysis
in a harmonized way. You use what you get, with greatly simplified format and
coordinate conversions. But you also don’t explore behind this horizon, most
often because you don’t know what is/would be possible. This approach could be
taken farther, such as to incorporate the collection and validation of provenance and other metadata earlier and in more context-sensitive ways in users’ workflows.

But integrated environments also don’t cover all use cases. Note
that in order to publish the results of scientific research, the underlying
data for the research must often be cited if not included in the publication;
standardized data management can assist with standardized data citation. This
has been an active discussion area in ESIP
and RDA but
not in the OGC.
(I won’t get into the debate over distinctions between data, database, data
sets, and data products; see Joe
Hourcle’s excellent and humorous talk on this at Ignite@AGU a couple years
back.) This topic addresses an important part of science: reproducibility of
data for subsequent verification and reanalysis. The science community would
like for citations to enable linking to the cited data set, whatever that
takes. New discussions are taking place about citation of highly dynamic data
sets. Another area is semantics, which cuts across all communities.

I also want to emphasize that science, technology, and standards development are interdependent and
continually evolving.Case 1:
I’ve sat with clients while prototyping a user interface or web page design for
them, and gotten this reaction: “You can do that?? Hmm, can you do this [insertwish-list item] too?” Often
the answer would be “why not?” Standards
for data exchange, model integration and visualization make that much more
frequent and productive. Case 2: Every now and then something like the
Internet or even “just” the iPhone comes along and shakes up the whole fabric
of society, technology & science. Various standards have to catch up, and
new standards and even whole communities appear. Case 3: As science and
technology for satellite-based Earth observation and analysis improve, the
complexity and volumes of data increase exponentially, requiring continual
evolution in the standards and tools needed to support them.

So if science, technology and standards are all
interdependent, and standards tend to catch up as needed, what’s the problem? A
big problem is that data standards development generally is not supported or
rewarded in academia as it is in industry and government agencies. And that
means that academic use of standards is limited to what already works for them,
with little input to influence the standards evolution. So the standards community is actually missing out on a huge contribution
that could conceivably come from academia, and academia is missing out on the
rewards of influencing standards to help them do their work more efficiently
and transparently.

I’m not saying academia is completely missing from the OGC;
there are over a hundred universities with one or more professors, researchers or students registered on the OGC portal.
The majority of these universities are in Europe. The US has only about 30 universities with
OGC membership, and very few of these are active in standards development. I
would contend that most US university members of OGC are there to learn and
master the OGC standards, rather than to help construct and advance the
standards to support geoscience research. We’re also not teaching OGC standards
widely in academia.

But NSF could help here, and EarthCube might be the key.

Enter EarthCube: The US National Science Foundation (NSF) EarthCube program is a long-term initiative to identify
and nurture opportunities to make geoscience data, models, workflows,
visualization, and decision support available and usable across all geoscience
domains. This is an ambitious
undertaking. Lucky for me, Anna Kelbert
just published an excellent
overview of the motivation and emerging structure for EarthCube, so I don’t
have to repeat all that here. I’ll just say there are now about 30 funded
projects in varying stages of completion, and more on the way. These are in 3 categories: Research Coordination Networks
(outreach to potential users), Conceptual Designs
(core architecture frameworks), and Building Blocks
(technology components). It is intended to be community driven and community
governed.

How it could
happen: In the next segment, I'll propose a way to leverage EarthCube to loosely-couple the NSF research agenda with the key IT standards development agendas.

Thanks to Joe Hourcle, Ingo Simonis, Scott Simmons and Carl Reed for contributions to this segment.

The thoughts and opinions expressed here are those of the contributor alone, and do not necessarily reflect the views of EarthCube's governance elements, funding agency, or staff.