Last week I attended the workshop in Exeter to lay out the groundwork for building a new surface temperature record. My head is still buzzing with all the ideas we kicked around, and it was a steep learning curve for me because I wasn’t familiar with many of the details (and difficulties) of research in this area. In many ways it epitomizes what Paul Edwards terms “Data Friction” – the sheer complexity of moving data around in the global observing system means there are many points where it needs to be transformed from one form to another, each of which requires people’s energy and time, and, just like real friction, generates waste and slows down the system. (Oh, and some of these data transformations seem to generate a lot of heat too, which rather excites the atoms of the blogosphere).

Which brings us to the reasons the workshop existed in the first place. In many ways, it’s a necessary reaction to the media frenzy over the last year or so around alleged scandals in climate science, in which scientists are supposed to be hiding or fabricating data, which has allowed the ignoranti to pretend that the whole of climate science is discredited. However, while the nature and pace of the surface temperatures initiative has clearly been given a shot in the arm by this media frenzy, the roots of the workshop go back several years, and have a strong scientific foundation. Quite simply, scientists have recognized for years that we need a more complete and consistent surface temperature record with a much higher temporal resolution than currently exists. Current long term climatological records are mainly based on monthly summary data. Which is inadequate to meet the needs of current climate assessment, particularly the need for better understanding of the impact of climate change on extreme weather. Most weather extremes don’t show up in the monthly data, because they are shorter term – lasting for a few days or even just a few hours. This is not always true of course; Albert Klein Tank pointed out in his talk that this summer’s heatwave in Moscow occured mainly in a single calendar month, and hence shows up strongly in the monthly record. But in general, that is unusual, and so the worry is that monthly records tend to mask the occurrence of extremes (and hence may conceal trends in extremes).

The opening talks at the workshop also pointed out that the intense public scrutiny puts us in a whole new world, and one that many of the workshop attendees are clearly still struggling to come to terms with. Now, it’s clear that any new temperature record needs to be entirely open and transparent, so that every piece of research based on it could (in principle) be traced all the way back to basic observational records, and to echo the way John Christy put it at the workshop – every step of the research now has to be available as admissible evidence that could stand up in a court of law, because that’s the kind of scrutiny we’re being subjected to. Of course, the problem is that not only isn’t science ready for this (no field of science is anywhere near that transparent), it’s also not currently feasible, given the huge array of data sources being drawn on, the complexities of ownership and access rights, the expectations that much of the data will have high commercial value.

I’ll attempt a summary, but it will be rather long, as I don’t have time to make it any shorter. The slides from the workshop are now all available, and the outcomes from the workshop will be posted soon. The main goals were summarized in Peter Thorne’s opening talk: to create a (longish) list of principles, a roadmap for how to proceed, an identification of any overlapping initiatives so that synergies can be exploited, an agree method to engage with broader audiences (including the general public), and an initial governance model.

Did we achieve that? Well, you can skip to the end and see the summary slides, and judge for yourself. Personally, I thought the results were mixed. One obvious problem is that there is no funding on the table for this initiative, and it’s being launched at a time when everyone is cutting budgets, especially in the UK. Which meant that occasionally it felt like we were putting together a Heath Robinson device (Rube Goldberg to you Americans) – cobbling it together out of whatever we could find lying around. Which is ironic really given that the major international bodies (e.g. WMO) seem to fully appreciate the importance of this. And of course, the fact that it will be a vital part of our ability to assess the impacts of climate change over the next few decades.

Another problem is that the workshop attendees struggled to reach consensus on some of the most important principles. For example, should the databank be entirely open, or does it need a restricted section? The argument for the latter is that large parts of the source data are not currently open, as the various national weather services that collect it charge a fee on a cost recovery basis, and wish to restrict access to non-commercial uses as commercial applications are (in some cases) a significant portion of their operating budgets. The problem is that while the monthly data has been shared freely with international partners for many years, the daily and sub-daily records have not, because these are the basis for commercial weather forecasting services. So an insistence on full openness might mean a very incomplete dataset, which then defeats the purpose, as researchers will continue to use other (private) sources for more complete records.

And what about an appropriate licensing model? Some people argued that the data must be restricted to non-commercial uses, because that’s likely to make negotiations with national weather services easier. But others argued that unrestricted licenses should be used, so that the databank can help to lay the foundation for the development of a climate services industry (which would create jobs, and therefore please governments). [Personally, I felt that if governments really want to foster the creation of such an industry, then they ought to show more willingness to invest in this initiative, and until they do, we shouldn’t pander to them. I’d go for a cc by-nc-sa license myself, but I think I was outvoted]. Again, existing agreements are likely to get in the way: 70% of the European data would not be available if the research-only clause clause was removed.

There was also some serious disagreement about timelines. Peter outlined a cautious roadmap that focussed on building momentum, and delivering the occasional reports and white papers over the next year or so. The few industrial folks in the audience (most notably, Amy Luers from Google) nearly choked on their cookies – they’d be rolling out a beta version of the software within a couple of weeks if they were running the project. Quite clearly, as Amy urged in her talk, the project needs to plan for software needs right from the start, release early, prepare for iteration and flexibility, and invest in good visualizations.

Oh, and there wasn’t much agreement on open source software either. The more software oriented participants (most notably, Nick Barnes, from the Climate Code Foundation) argued strongly that all software, including every tool used to process the data every step of the way should be available as open source. But for many of the scientists, this represented a huge culture change. There was even some confusion about what open source means (e.g. that ‘open’ and ‘free’ aren’t necessarily the same thing).

On the other hand, some great progress was made in many areas, including identifying many important data services, building on lessons learnt from other large climate and weather data curation efforts, offers of help from many of the international partners (including offers of data from NCDC, NCAR, EURO4M, from across Europe and North America, as well as Russia, China, Indonesia, and Argentina). Agreement was clear that version control and good metadata are vital, and need to be planned for right from the start, but also that providing full provenance for each data item is an important long term goal, but cannot be a rule from the start, as we will have to build on existing data sources that come with little or no provenance information. Oh, and I was very impressed with the deep thinking and planning around benchmarking for homogenization tools (I’ll blog more on this soon, as it fascinates me).

Oh, and on the size of the task. Estimates of the number of undigitized paper records in the basements of various weather services ran to hundreds of millions of pages. But I still didn’t get a sense of the overall size of the planned databank…

Things I learnt:

Steve Worley from NCAR, reflecting on lessons from running ICOADS, pointed out that no matter how careful you think you’ve been, people will end up mis-using the data because they ignore or don’t understand the flags in the metadata.

Steve also pointed out that a drawback with open datasets is the proliferation of secondary archives, which then tend to get out of date and mislead users (as they rarely direct users back to the authoritative source).

Oh, and the scope of the uses of such data is usually surprisingly large and diverse.

Jay Lawrimore, reflecting on lessons from NCDC, pointed out that monthly data and daily and sub-daily data are collected and curated along independent routes, which then makes it hard to reconcile them. The station names sometimes don’t match, the lat/long coords don’t match (e.g. because of differences in rounding), and the summarized data are similar but not exact.

Another problem is that it’s not always clear exactly which 24-hour period a daily summary refers to (e.g. did they use a local or UTC midnight?). Oh, and this also means that 3- and 6-hour synoptic readings might not match the daily summaries either.

Some data doesn’t get transmitted, and so has to be obtained later, even to the point of having to re-key it from emails. Long delays in obtaining some of the data mean the datasets frequently have to be re-released.

Personal contacts and workshops in different parts of the world play a surprisingly important role in tracking down some of the harder to obtain data.

NCDC runs a service called Datzilla (similar to Bugzilla for software) for recording and tracking reported defects in the dataset.

Albert Klein Tank, describing the challenges in regional assessment of climate change and extremes, pointed out that the data requirements for analyzing extreme events are much higher than for assessing global temperature change. For example, we might need to know not just how many days were above 25°C compared to normal, but also how much did it cool off overnight (because heat stress and human health depend much more on overnight relief from the heat).

John Christy, introducing the breakout group on data provenance, had some nice examples in his slides of the kinds of paper records they have to deal with, and a fascinating example of a surface station that’s now under a lake, and hence old maps are needed to pinpoint its location.

From Michael de Podesta, who insisted on a healthy dose of serious metrology (not to be confused with meteorology): All measurements ought to come with an estimation of uncertainty, and people usually make a mess of this because they confuse accuracy and precision.

Uncertainty information isn’t metadata, it’s data. [Oh, and for that matter anything that’s metadata to one community is likely to be data to another. But that’s probably confusing things too much]

Oh, and of course, we have to distinguish Type A and Type B uncertainty. Type A is where the uncertainty is describable using statistics, so that collecting bigger samples will reduce it. Type B is where you just don’t know, so that collecting more data cannot reduce the uncertainty.

From Matt Menne, reflecting on lessons from the GHCN dataset, explaining the need for homogenization (which is climatology jargon for getting rid of errors in the observational data that arise because of changes over time in the way the data was measured). Some of the inhomogeneities are due to abrupt changes (e.g. because a recording station was moved, or got a new instrument), and also gradual changes (e.g. because the environment for a recording station slowly changes, e.g. gradual urbanization of its location).

Matt has lots of interesting examples of inhomogeneities in his slides, includes some really nasty ones. For example, a station in Reno, Nevada, that was originally in town, and then moved to the airport. There’s a gradual upwards trend in the early part of the record, from an urban heat island effect, and another similar trend in the latter part, after it moved to the airport, as the airport was also eventually encroached by urbanisation. But if you correct for both of these, as well as the step change when the station moved, you’re probably over-correcting….

which led Matt to suggest the Climate Scientist’s version of the Hippocratic Oath: First, do not flag good data as bad; Then do not make bias adjustments where none are warranted.

While criticism from non-standard sources (that’s polite-speak for crazy denialists) is coming faster than any small group can respond to (that’s code for the CRU), useful allies are beginning to emerge, also from the blogosphere, in the form of serious citizen scientists (such as Zeke Hausfather) who do their own careful reconstructions, and help address some of the crazier accusations from denialists. So there’s an important role in building community with such contributors.

John Kennedy, talking about homogenization for Sea Surface Temperatures, pointed out that Sea Surface and Land Surface data are entirely different beasts, requiring totally different approaches to homogenization. Why? because SSTs are collected from buckets on ships, engine intakes on ships, drifting buoys, fixed buoys, and so on. Which means you don’t have long series of observations from a fixed site like you do with land data – every observation might be from a different location!

Things I hope I managed to inject into the discussion:

“solicitation of input from the community at large” is entirely the wrong set of terms for white paper #14. It should be about community building and engagement. It’s never a one-way communication process.

Part of the community building should be the support for a shared set of open source software tools for analysis and visualization, contributed by the various users of the data. The aim would be for people to share their tools, and help build on what’s in the collection, rather than having everyone re-invent their own software tools. This could be as big a service to the research community as the data itself.

We desperately need a clear set of use cases for the planned data service (e.g. who wants access to which data product, and what other information will they be needing and why?). Such use cases should illustrate what kinds of transparency and traceability will be needed by users.

Nobody seems to understand just how much user support will need to be supplied (I think it will be easy for whatever resources are put into this to be overwhelmed, given the scrutiny that temperature records are subjected to these days)…

The rate of change in this dataset is likely to be much higher than has been seen in past data curation efforts, given the diversity of sources, and the difficulty of recovering complete data records.

Nobody (other than Bryan) seemed to understand that version control will need to be done at a much finer level of granularity than whole datasets, and that really every single data item needs to have a unique label so that it can be referred to in bug reports, updates, etc. Oh and that the version management plan should allow for major and minor releases, given how often even the lowest data products will change, as more data and provenance information is gradually recovered.

And of course, the change process itself will be subjected to ridiculous levels of public scrutiny, so the rational for accepting/rejecting changes and scheduling new releases needs to be clear and transparent. Which means far more attention to procedures and formal change control boards than past efforts have used.

I had lots of suggestions about how to manage the benchmarking effort, including planning for the full lifecycle: making sure the creation of the benchmark is a really community consensus building effort, and planning for retirement of each benchmark, to avoid the problems of overfitting. Susan Sim wrote an entire PhD on this.

I think the databank will need to come with a regularly updated blog, to provide news about what’s happening with the data releases, highlight examples of how it’s being used, explain interesting anomalies, interpret published papers based on the data, etc. A bit like RealClimate. Oh, and with serious moderation of the comment threads to weed out the crazies. Which implies some serious effort is needed.

…and I almost but not quite entirely learned how to pronounce the word ‘inhomogeneities’ without tripping over my tongue. I’m just going to call them ‘bugs’.

Related

4 Comments

From my experience in user/data support from ESG, I could have attended this workshop as a somewhat-informed person. Ah well. At least Steve W and Bryan L were there – they know the same issues as I do very well.

Some of the issues come back to what the climate science community hasn’t fully digested – getting all this data out there for more-or-less general consumption isn’t a one-way process. I’ve gotten questions from users of our data that surprised me – how do I calculate correct global averages with Excel; what sort of climate should I expect in my area if I plan on starting a green housing business; what will happen to plague incidence in this county given climate change; I’m a high-school student with a science fair project and can you help; based on the data that I’ve downloaded, which climate model is best; what is netCDF; how come this later data is different than the older one; how can I use this data in my favorite GIS package…

These users are all honest and have good questions – some of which I can answer as a data manager, others require scientific expertise. The lesson is that compared to model data, observational data is *much* harder to correctly and adequately maintain and *explain*, which is the basic upshot of the whole enterprise. Given that very little, if any, money or resources are being devoted to the work, I’m pessimistic that things will get done as well as they ought to be. Sigh.

Neil: “Don’t inhomogeneities == heterogeneities?”
I guess so, but the term evolved from the label for the procedure for removing them: homogenization. Hence it makes more sense to say homogenization removes inhomogeneities.
Plus, heterogeneities doesn’t convey a sense of wrongness.