Bradley's Data Archiving

Here is a discussion of two of Bradley’s answers to questions from the House Committee letter pertaining to federal grants and archiving. Questions and Answers 2. List all financial support you have received related to your research, including, but not limited to, all private, state, and federal assistance, grants, contracts (including subgrants or subcontracts), or other financial awards or honoraria….

A list of grants received for my research can be found in my curriculum vitae [which was attached as part of question 1, but which, to my knowledge, is not publicly available]

4. Provide the location of all data archives relating to each published study for which you were an author or co-author and indicate: (a) whether this information contains all the specific data you used and calculations your performed, including such supporting documentation as computer source code, validation information, and other ancillary information, necessary for full evaluation and application of the data, particularly for another party to replicate your research results; (b) when this information was available to researchers; (c) where and when you first identified the location of this information; (d) what modifications, if any, you have made to this information since publication of the respective study; and (e) if necessary information is not fully available, provide a detailed narrative description of the steps somebody must take to acquire the necessary information to replicate your study results or assess the quality of the proxy data you used.

Some of the data used in my research is archived at the World Data Center for Paleoclimatology (WDC-A), Boulder, Colorado. Other data are also available to the general public at NOAA or in other national data depositories around the world. When I or my students have generated data sets they are generally sent to the WDC-A once the results have been published. This is the normal procedure followed in my field. If someone is interested in specific data or procedures usd, they generally write to me requesting that information. Data related to the Mann et al. (1998) paper are available at ftp://holocene.evsc.virginia.edu/pub/MBH98.

As a quick aside, I sent Bradley an inquiry about methods here without receiving a response. NSF Awards to Bradley NSF is only one potential paleoclimate source, but it is the most important. NSF has has a convenient function for extracting award information for each Principal Investigator (PI). So it’s hardly an onerous job to collate this information. Here’s a collation that took me about 5 minutes to prepare (but this is not a substitute for Bradley’s own collation.) Awards from NSF in which Bradley was PI total over $3.5 million, so it’s not a small amount. Over $1.1 million has been awarded with David Verardo as the responsible NSF officer. MBH98-MBH99 were funded in part by award ATM-9626833 amounting to $137,164. This study did not involve any primary data collection, but was limited to processing of other people’s data. MBH98 was published in March 1998 and, according to the information below, the award expired on June 30, 1999. If we consider archiving policy of the Earth System History program in effect as of 1995, then the data should have been archived at the time of publication or within 3 years of collection. Archiving of data in July 2004 as part of a Nature Corrigendum, after prior refusal to provide the data, hardly counts as complicance. Most of Bradley’s awards shown below (including 2 active awards) pertain to the study of sediments in Arctic lakes: including the large expired awards 9819362, 9707081, 9322769 and 8922082. I’ll spot check the archiving on a couple of them below. ‘

Insti-tution

Program

Award Number

Title

Expiration Date

‘ Awarded to Date

NSF

ARC

454959

Collaborative research: a synthesis of the last 2000 years of climatic variability from Arctic lakes

28-Feb-09

89,380

NSF

ATM

402421

High-Resolution Studies of High Arctic Paleoclimate from Varved Lake Sediments

31-May-07

425,147

NSF

BCS

221376

Doctoral Dissertation Research: Varves and Varve-Forming Processes in a High Arctic Lake

Paleoclimatic Significance of Laminated Lake Sediments‘ From the Canadian High Arctic

30-Nov-93

322,976

NSF

EAR

8400049

Acquisition of Field and Laboratory Equipment for Lake‘ Sediment Analysis

30-Sep-85

50,000

NSF

ATM

8017745

Climatic Fluctuations of Northernmost North America

30-Jun-84

128,300

NSF

ATM

7715189

The Secular Climatic History of the Arid and Semi-Arid‘ Western United States

31-Mar-81

65,000

NSF

ATM

7500975

Past Glacial Activity in the High Arctic

30-Apr-77

41,800

‘

‘

TOTAL

‘

3,638,850

Archiving Award 9708071, expired December 2003, is entitled "High Resolution Lake Sediment Studies for Paleoclimatic Reconstruction in the Canadian High Arctic". Its abstract mentioned the potential of this dataset to provide information on the so-called Medieval Warm Period (adding to the interest in acquiring the dataset):

This award will support a project designed to extend the record of summer temperature derived from varved sediments in the Canadian High Arctic to cover the last 2-3000 years. This will place the dramatic changes observed over the few centuries into a longer term perspective. In particular, conditions around 1,000 years ago, during the so-called Medieval Warm Period, will be examined for its the global significance. Multiple dating approaches will be used to provide confidence in the chronology and to establish a secular paleomagnetic record for the region. This can then be used as a chronological template to help in dating other sedimentary records which are low in organic carbon. Hydrological studies in selected basins will clarify the climatic controls on sediment flux to these lakes and improve paleoclimatic interpretations of the varved sediment records recovered. By combining these observations with earlier work, a comprehensive conceptual model of the climatic controls on sediment flux to arctic lakes will be developed.

ATM-9322769 is entitled: "Laminated Lake Sediments from the Canadian High Arctic: Understanding the Climatic Signal for Paleoclimatic Reconstruction Synopsis." Its abstract states:

Ice core and other paleoclimatic records from the High Arctic suggest that summer temperatures reached minimum levels for the entire Holocene during the last 500 years, but underwent a dramatic reversal in the last 100 years. This award, under the Paleoclimate from Arctic Lakes and Estuaries (PALE) program is designed to study lake sediments from a number of sites to determine if this hypothesis is supported by the sedimentary record. To better understand the paleoclimatic signal in the sediments, a three year process-based study is planned to determine the primary controls on sediments flux and varved sediment formation in Sophia Lake, a High Arctic hypersaline, meromictic lake. Sophia Lake provides a simple topographic environment, which will facilitate efforts to isolate the primary climatic forcing. Sediments from lakes on the margin of Agassiz Ice Cap will also be recovered in order to link the paleoclimatic record of ice cores from the ice cap to sedimentary records from the glacier margin,

Both projects are long finished and have had ample time to be archived. Archiving would have been required long ago under the archiving policies of either PARCS or the Earth System History program (as discussed here.)

ESH data should be submitted to the World Data Center-A (WDC-A) for Paleoclimatology (Boulder, CO) within three years of generation or at the time of publication, whichever comes first.

WDCP has a function enabling one to search by contributor. If one does a search on Bradley, one cannot locate any archiving of Arctic lake sediments. PA
RCS has a slightly different database. The only information archived by Bradley or his students at PARCS seems to be some 1990-1992 streamflow measurements from Taconite Inlet here , which is mirrored at the UMass project website , where additionally weather station readings for 1991-1992 seasons are also shown. (This information was webbed up in 1997 and I was unable to locate more recent archiving). I sought confirmation from WDCP whether Arctic lake sediment data had been archived by Ray Bradley or his students (Braun, Retelle, probably others), noting that Sophia Lake, Lake Tuborg, Sawtooth Lake and Taconite Inlet lakes were possible sites. I was advised by WDCP that none of this data had been archived with them. They were only aware of the old Taconite Inlet information from 1991-1992 noted above. It is very difficult to find evidence from Arctic lake sediments supporting Bradley’s statement to the House Committee that:

When I or my students have generated data sets they are generally sent to the WDC-A once the results have been published. This is the normal procedure followed in my field.

Why then has Bradley not archived data on lake sediments from these older studies?

The first results on varved lake sediments in Finland were available already in 2000 or possibly earlier. According to these results, the medieval warm was clearly visible. Certain years were exactly matching some tree ring results from North America indicating not so regional MWP in Scandinavia. That means that tree rings indicate somehow the temperatures although data manipulating may allow for quite diverse conclusions.
It is interesting to see that Bradley now has data with corresponding resolution from North America. These should be compared with Scandinavian data. I am quite sure (based on the above mentioned tree rings) that the North American data also show MWP. That is possibly the reason for why Bradley so far has not disclosed the data sets.

Its quite simple. Data sets take large amounts of time, money and effort to obtain. Why would I want to give the data freely to someone with who I might be competing for the next round of funding. I would certainly not do it before publication and would go to a different journal. After publication might be different, but even then disclosure should be limited to data relating directly to the puiblished work, not associated data. Climate science might be a special case because of the policy implications, but if it is going to be treated differently then the funding needs to be treated differently.

Steve: Paul, you’ve put your finger on a good issue here. From the point of view of developing data for policy purposes, it is bizarre to have it tied up with personal prestige campaigns. Geologists work hard in mineral exploration – probably harder than climate scientists; they are results-oriented as well and scientificially competent. However, they don’t own or have title to the drill results. These are owned by the company. The idea that tree ring scientists have some sort of personal title to their data (e.g. Jacoby deciding that he’ll only archive “good” data) is incomprehensible to me. Part of the U.S. problem is that NSF is not enforcing their data archiving requirements.

Re: #5
Paul, you’ve pointed out the small-minded (but all too common) approach to dealing with the situation — restricting access to data. The other option is to work hard and fast during a limited period of exclusivity (for this work, until 2 years after collection or until publication seems *very* generous), then move on.
The limiting of access to the Dead Sea Scrolls for over a generation(!) is another example of a bad outcome from the first approach. The vast success of the various genome projects is an excellent example of the second approach. In the latter cases, periods of exclusivity were/are as short as 30-90 days from data collection (not publication!); now, that’s some motivation for prompt, efficient analysis!
In any case, your funding example seems flawed. You’ve left out the issue of *what activities* the new grant is to fund. I have a hard time believing that it’s easy to get money simply to reanalyze a data set. Given that, what major funding advantage is someone giving to a competitor by providing access to a data set? You could similarly argue that one shouldn’t publish one’s results because a competitor might thereby gain a funding advantage.
If biologists have been able to adapt to an era of “free data,” even with ever-present patent issues, I can’t see any moral or practical justification for climate scientists’ not following suit.

Sounds like there needs to be some disintermediation here. Any particular public sector grant should be used either for gathering data sets, or for interpreting them, but not both. Start attracting people to the data gathering side who do not have an axe to grind in the interpretation side.

Re: #7
It’s hard to imagine how that would work in practice, as interpreting the data is pretty much the intellectual point of the whole enterprise. Besides being unlikely to find anyone willing to generate data without the opportunity to analyze it at all, experience with data analysis would seem to be important in properly designing & executing the data collection.
I like Steve’s approach here of contrasting the presumably relevant archiving policies with the actual archiving status for different researchers’ projects. Publicizing the current situation is likely to help remedy it, especially if journal editors begin to enforce broad archiving requirements.

What strikes me about Paul’s remarks is that they pretty much admit what skeptics have been saying all along. There appears to be at least as much incentive for those in Big Climate to dissemble to protect their livelihoods as there is for people working for Big Business.

Re #8
Hard, but perfectly possible. And, I fear, necessary.
Consider Steve’s point above about “Jacoby deciding that he’ll only archive “good” data”. Even if they start being totally open about archiving the data for their published papers – how do we know they haven’t pre-selected which data they enter into the archive ?
This whole area has become so polluted by the political activists that I do not believe the normal process of science alone will be enough in the future.

Re: #10
You say it is perfectly possible to get people to collect data but not allow them to analyze any of the data they’ve gathered. I’d be interested in hearing how this could be done. If I were a climate scientist, I certainly wouldn’t sign up for such a job (unless it paid $150k or even more; is this what you had in mind?).

I love the idea of authors’ committment to original data sets by mandatory electronic archival *before* publication. It could create a superior trail on necessary changes and discourage later denial, fudge or fraud. Public embargo until publication would not upset me. Also I think that that policy changes could pry open and expand the relatively closed and inbred world of “peer reviews” without global disclosure before publication. I would like to hear more proposals to compel timely release for federally funded or regulated data because I think that this game is also a big problem in the medical-pharmaceutical world where numbers like 5 out of 6 funded studies go unpublished are tossed around. Shades of “tobacco science” self-censorship and pre-selection stalk the world.
Examining a number of AGW issues (numerous approaches in astrophysics, simple physics, predictive material balance and 500my geological record) combined with the individual, group and institutional evasions and policy aggressions, my personal assessment is that “Piltdown Mann” is a big concern at this point in the AGW debate, if not yet definitively proved. Keep digging, please.

Armand MacMurray :
I don’t see why you should have to be a member of the Worshipful Guild of Climatologists to gather this sort of data. Let’s face it, drilling tree-rings or ice cores is not rocket science. I’m sure any of the big project management companies could put together a team to do it quickly and efficiently, and with a professional audit trail.

The whole point here is that if this data gathering is being bought with our taxes, then we should ensure that the money is well spent. Would the use of independent, disinterested data-gatherers work out more expensive than a university professor and a gang of graduate students ? Quite possibly, in the short term. But if that is the price we have to pay for professional data-gathering, it will work out a hell of a lot cheaper, in the long term, than the sort of nonsense that Steve is uncovering.

Steve: Wouldn’t it be interesting to see the measurement data for the Sheep Mountain bristlecone up to 2002 for example? Or the updated Quelccaya dO18 to 2003? Or the Puruogangri data from 2000. The timing of release is also a factor. In the mineral exploration business, you have to release your results in a consistent and timely basis. You can group the results a little, but you couldn’t hold off results for more than a month. There’s a huge temptation to hold off releasing bad results in the hopes that you’ll get some good results to offset them. It’s hard not to avoid thinking that if these guys had proxy results that were off the charts we’d be hearing all about them. For someone with mineral exploration experience, their silence is deafening: it means that none of their results are particularly dramatic. But if any results come in with a big spike, we’d hear all about it. In the stock market, withholding results is illegal as a breach of an ongoing obligation of full, true and plain disclosure. It amuses me that climate scientists are so sanctimonious and yet continually fail to meet disclosure standards applicable to the sleaziest mining promoter.

I seem to have already get a response from Bradley. He hasn’t archived any data, but I am now blocked from the ftp://eclogite.mass.edu site as well as the ftp://holocene/virginia.edu. Dare one say that the Hockey Team is being a little bit undignified? Can you imagine trying to explain to someone exactly why you set up access blocks?

As a contrast to the data archiving policy of the climate scientists one should point out what happens in cosmology and astronomy. For example, the WMAP satellite data, used to measure the cosmic microwave background left over from the Big Bang, is now available to everyone. The scientists that built the experiments got first crack at analyzing the data and were able to publish first, which is fair enough. Other groups are now re-analyzing the raw data with the help of the original scientists. Take a look at the most recent Scientific American where this is mentioned.

I believe data collected by the Hubble Space Telescope is treated in the same way. The astronomers making an observation have the privilege of analyzing the data first, but it is then put into a public archive.

True, these are multi-hundred million to billion dollar devices, but the principle is the same. It’s public money so the data should be freely available after a certain time, but the people who did the hard work are rewarded by exclusive access for a reasonable initial period of time.

Having said this, I also think that an archiving policy should only be in place where it is difficult or extremely expensive to create an independent data set. In the case of proxy data, it’s not expensive or difficult to generate a single dataset but the whole idea is to build up a very large database that can be used for climate studies. The value is in the database as a whole, which in its entirety is very difficult and expensive for any one scientist or group to replicate.

John A,
The problem is that some scientists aren’t acting like ladies or gentlemen (whichever is appropriate). However, some of the proposed solutions aren’t going to work.
John A, if you were a journal editor who wouldn’t publish papers until you had the relevant databases/programs in hand, there might be times when your journal had very few papers. You would also be increasing the cost of publishing in your journal because someone, somewhere, has to check that these databases contain the appropriate data. Nobody is being paid to do this right now.
If you are the funding authority, you have no formal control over where or when your Principal Investigators publish their papers. You are recommending that the country’s best scientists get money to do the science that they think is of the greatest interest. Your only “control” over them comes from the opinion of the referees about their future proposals.
The only solution to this problem that I see is that NSF/NASA/whoever do start convincing PIs who propose to obtain particular data that it is in their best interests to promptly archive the data that they have promised to obtain and share any analysis programs when called upon to do so.
This sharing of programs can be a can of worms as well. If someone has spent years developing a program to solve a particular scientific problem, is it reasonable to expect him/her to simply give it to someone else, who wants to do some of the same problems that the program developer wants to do? Note, however, that I’m not defending Mann’s actions, where the program in question seems relatively simple.

Re: #18 "If someone has spent years developing a program to solve a particular scientific problem, is it reasonable to expect him/her to simply give it to someone else, who wants to do some of the same problems that the program developer wants to do?"

Yes if the scientist is receiving government money. Scientists want it both ways. Many claim objective scientific superiority over their peers who are employed in industry because they aren’t tainted by the quest for profit. But the only reason not to publish all source code is for the scientist to profit. That profit may come in the form of larger grants and/or academic prestige.

If a scientist wants to keep his source code private they should get private funding. There should be heavy strings attached to all government grants both to discourage doing science on the backs of taxpayers and to entitle the taxpayer to the use of what they paid for.

Mann’s computer programs are not really “software” in the sense that the programs can be used operationally. They are really more like laboratory notebooks giving a detailed description of methodology.

Secondly, if someone doesn’t want to keep make their code public, then don’t publish articles. Get private funding and do’t publish any articles. But if your data and methods are being used in scientific prospectuses, then you’ve waived any privacy.

Also and this is often neglected, Mann’s verbal descriptions have been proven to be inaccurate. In a business situation, once you’d found problems in one area (whether it was the inaccurate listing of series used or the inaccurate description of the PC method), you would not trust the rest of the verbal descriptions. Why should you?

I think there are good grounds for allowing working scientists time to analyse their own data first, and then to have a form of mandatory release after publication, or, for publicly funded work, after a set time if nothing is published. Some form of editting of data is obviously applied, mistakes and data without relevance or otherwise unusable, but generally I suggest the requirement should be for all relevant data, with a broad definition of relevant applied.

I can understand the personal desires of scientists to hold onto data, especially if there is further work being done, but there comes a point where replicability becomes an issue. Also, privately funded researchers have no specific obligation to release data, but unless they do, public policy should never be based on such research.

Re: #18
Roger, regarding the practical power of a journal editor to compel archiving, the major journals and big generalist journals should be able to compel such w/o much of a problem. Nature and Science needn’t fear a lack of submissions. Thus, the past history of Nature in this case is somewhat disturbing.
As for the funding agencies’ control over grantees, that’s very simple: without archiving compliance, withdraw eligibility for future grants.

Re #22
I agree with all this, but with a big flag on “with a broad definition of relevant [data] applied”.
The only acceptable definition is all data observed. Otherwise, there is a risk of a partisan researcher simply excluding all observations that do not fit his thesis.
For example, as a reductio ad absurdam, MBH06 could prove the existence of the hockey stick by redoing their “analysis” based on the bristlecones only.

Steve (#21): "if someone doesn’t want to keep make their code public, then don’t publish articles. Get private funding and don’t publish any articles. But if your data and methods are being used in scientific prospectuses, then you’ve waived any privacy."

This summarizes it well. My take:

1. the researcher gets first crack at data analysis, with the stipulation that data be archived by the end of the grant
2. permit the researcher to post "messy" data — not necessarily requiring lots of time to clean it up and thoroughly document it. It can a rather enormous task to do that. But Steve and others have asked for nothing more than working spreadsheets or raw data files, and that seems reasonable, and pretty easy (if one has nothing to hide)
3. there’s a federal metadata standard — the "FGDC metadata standard" (Federal Geographic Data Committee) that is in use, and software exists that makes it pretty easy to create files describing one’s data set. The metadata files themselves are pretty big and ugly, but nobody needs to create them from scratch. If you had an archived data set, the metadata file would tell the location (lat-long), the parameter measured, the file format, the units, where to find the data, who to contact for questions, etc. Answering questions about data can be a royal pain, and I’d suggest that NSF and others are NOT paying scientists to do that. But if one spends a little time creating good metadata, most questions are automatically answered — "a stich in time saves nine."
4. I am bothered by the elitism I see among climate scientists. As a climate scientist myself, I believe that there should be a really "big tent" here, because the presence of people from other disciplines strengthens our field. Look at what Steve and Ross have done: using rigorous statistical and evaluation processes developed in other fields, they have exposed some true weaknesses in climate science, and fundamentally changed the way we look at paleo data — for the betterment of the science. No one within the field was either (a) capable or (b) willing to do what they did. I tip my hat to you two!

I’ve been working with an electronics engineer (he has some pretty cool signal processing ideas), a civil engineer, geologists, statisticians, Ag Engineering folks … the synergy you get from multiple viewpoints is quite amazing, and these folks are showing me things in my data sets that I never would have seen on my own (or if I only hung out with climate scientists). So my message to Steve and Ross and others is: welcome to climate science — I’m glad you came to the party!

Steve: Thanks. I seem to spend more time on disclosure and due diligence issues right now than anything. Simply viewing IPCC TAR as a “scientific prospectus” is an approach that would be hard to get to without the particular experience that I’ve had, but it’s a good framework. Obviously there’s not much experience with international science assessment reports and appropriate standards of disclosure and due diligence, but there’s a lot of experience with standards of disclosure and due diligence for mining promotions and I’ve never heard anyone suggest that IPCC should apply a lower standard than mining promotions. If anything, we hear from learned societies about what a remarkable standard IPCC sets. I’ve heard a lot of what seemed to me like whining and complaining by skeptics about IPCC. So I simply tried applying the minimum standards of due diligence and disclosure applicable to mining promotions, which are relatively objective and not mere whining, and tried to see how these things worked. Lots of surprises, that’s for sure.

I’d like do some more mathematical work, showing the application of econometric methods dealing with serial autocorrelation to paleoclimate proxies. I’ve also done some really pretty work on making tree ring site chronologies using statistical methods rather than ad hoc recipes (with the byproduct of better statisitcal control of what you’ve really got in one of these chronologuies). I’d like to do something on confidence interval estimation with proper allowance for autocorrelation, but haven’t figured out how to do it properly. But I’m so overwhelmed with other pressing stuff that I hardly ever get to it. I also have some surgery to do on the other multiproxy studies. There’s nothing flashy in these studies. But people keep throwing these other studies in our faces, so it’s impossible not to respond. There will be a few more red faces.

Paul, – re #15.
Yes, the Hubble Space Telescope data are archived automatically at the Space Telescope Science Institute. It’s easy to archive, since it is digital and it has been saved at the Institute since the time it was obtained. I think that data is also archived at the big ground-based telescopes but I may be wrong about this.
Roger Bell

The answers are nonresponsive. The question asked for specificity (study by study description of what is archived where). that would take some work of course. But that is the answer to the question. Actually, it would be a really good exercise for a PI to do for his own purposes (to know where his stuff is…)