NAS Report on Data and Methods Disclosure

Jeff Id on the Air Vent has written a post pointing out the recent publication online of a report by the Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age from the National Academy of Sciences: Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. I am starting this thread so that it can also be discussed here

The committee and its objectives were discussed on CA, for example, here and in a number of other places. You can find the other threads easily by using the search CA feature at the top right of the page for the phrase “NAS Committee”.

An executive summary of the report is available in pdf format and access to an online version of the full text is available here.

It makes for interesting light reading. From the summary:

Legitimate reasons may exist for keeping some data private or delaying their release, but the default assumption should be that research data, methods (including the techniques, procedures, and tools that have been used to collect, generate, or analyze data, such as models, computer code, and input data), and other information integral to a publicly reported result will be publicly accessible when results are reported, at no more than the cost of fulfilling a user request. This assumption underlies the following principle of accessibility:

Data Access and Sharing Principle: Research data, methods, and other information integral to publicly reported results should be publicly accessible.

(bold in report)

Maybe the folks at HadCru should pay attention…

Update (using my Comment 1:)

It appears that there may be some caveats on those for whom the the data should be accessible. From page 2 of the summary chapter (page 3 of the pdf) (all bold mine):

Documenting work flows, instruments, procedures, and measurements so that others can fully understand the context of data is a vital task, but this can be difficult and time-consuming. Furthermore, digital technologies can tempt those who are unaware of or dismissive of accepted practices in a particular research field to manipulate data inappropriately.

On the next page, this seems to be clarified somewhat:

The most effective method for ensuring the integrity of research data is to ensure high standards for openness and transparency. To the extent that data and other information integral to research results are provided to other experts, errors in data collection, analysis, and interpretation (intentional or unintentional) can be discovered and corrected. This requires that the methods and tools used to generate and manipulate the data be available to peers who have the background to understand that information.

The “public” appears to be those who are deemed to deserve it by the owners of the data and methods. After all, who knows what damage can be done when an examination of the data and methods is carried out by someone who doesn’t “understand the information” or associated “accepted practices”. ;)

93 Comments

Re: romanm (#1), I understand that paragraph in a different way. For me it says that it is especially important to make them available to those experts, and NOT that it should ONLY be made available to said experts.

Thanks, Roman — I was excited that this would apply to the Svalbard data being withheld by Elisabeth Isaksson, until I read the “members-only” weasel words in your comment. These are important enough that perhaps you should move them to the post itself. [name corrected]

Well, if we have shown examples of “experts” abusing basic data, just think of the damage that laypersons could do.

But wait a minute here, is not that what the peer review system is supposed to filter out and is it not the peer reviewed literature that the prestigious review organizations look exclusively to for material? Could it be that the concern is with non peer review sources such as blogs and the influence they may have in a PR battle?

If I were able to get my hands on enough tree ring data, I could genetically engineer a single supertree which could control the weather (at least in the Northern Hemisphere).

I could cause every major league baseball game to be rained out, forcing male baseball fans to pay more attention to their spouses. When the escalation in domestic disturbances increases to sufficient levels, the world will cry out for a voice to calm the agitated households. I will then hypnotize the enraptured audience as they eat up the rancid psychobabble assuaging their endless neurosis, drafting them as drones as they complete my plan to rule over my newly conquered domain.—BRAINAMICA.

The only obstacle I foresee is that I cannot cause hockey games to be rained out, leaving the potential for a Hockey Stick front attacking from the North.

Yeah, I chime in with all of those above, with whom I am in total accord. Hoping that the following is not deemed off-topic and therefore snipped, the NAS “guidelines” being discussed here remind me of a comment [snip – sorry, no political statements please]. Ergo, “The NAS disclosure guidelines were drafted by the in-crowd and enacted by the in-crowd to favor the people that the in-crowd loves the most: (Who could that possibly be?)”

Don’t touch the data you’ll kill somebody! The notion that data and methods in the hands of people who don’t know what to do with them is just paternalistic alarmism. Consider this. What is the harm of giving tree ring data to Mann versus the harm of giving tree ring data to me. Me? I got no clue what to do with it ( err wait after looking at a couple series I figured that some kinda transform was needed for early years) and can do no damage. Dr. Mann on the other hand has caused considerable confusion by using open data with closed methods. Data won’t hurt you. It’s what you do with data that causes the problems. If data and methods are provided under a copyleft license then users of the data are required to “copy back” their derivative works. But never mind. Here is my data

Of all the issues raised here regarding the state of climate science I think data firewalling is potentially the most alarming. The bureaucratic response to being attacked is somewhat predictable but it remains totally unacceptable. Transparency is a key hallmark of good science and these stonewalling efforts only serve to increase public skepticism of usually perfectly valid results.

This seems like an odd sort of turf battle where the very parties that should be encouraging transparency are fighting it. That said, it’s very clear that at least some of the concerns raised here are very likely trivial errors that are then blown out of proportion in less responsible venues. That does not justify hiding the data however. I’d recommend you take advantage of the large number of readers here to power something of a grass roots effort to “free the climate data”. In the USA this would likely happen if only a handful of senators and congresspeople started to raise the data stonewalling publicly.

In the same way RealClimate routinely fails to identify, respect, or bring enough context to AGW skeptical concerns, Climate Audit should be summarizing the *relevance* of the “audits” rather than leaving things to the imagination of the uninformed. Questionable temperature station citing is something of a religion here but I’ve seen no reason to suspect that this undermines any aspect of the AGW hypothesis.

Questionable temperature station citing is something of a religion here but I’ve seen no reason to suspect that this undermines any aspect of the AGW hypothesis.

The surface station and sea surface temperature data are the foundation upon which the GCM (General Circulation Models) are built to foretell catastrophic global warming due to human influences upon carbon dioxide and other atmospheric gases. As in any mathematical equation or model on or off of a computer, Garbage In results in Grabage Out (GIGO).

The temperature readings are, in fact, only a very very small sampling of the actual physical thermal state of the atmosphere, yet they are used in the GCM as if they are reliable proxies for the overall thermal state of the atmosphere, and perhaps the hydrosphere. Insofar as the temperature data is inaccurate as a proxy for the thermal state of the atmosphere and hydrosphere, so will the GCM results produce exagerrated and erroneous results the farther they forecast into the future. Given the probable range of error in the initial temperature data, it remains to be demonstrated, much less proven, how it is possible for the GCM to discriminate between the accumulated errors and the total possible influence of an atmospheric gas or gasses such as carbon dioxide. In other words, the temperature data used as proxies in the GCM are highly sensitive to initial errors, yet the observational errors seen in the physical world are not being acknowledged in the unphysical modeling, resulting in GIGO.

Gerd Leipold of Greenpeace in an interview by BBC is quoted as saying “emotionalizing issues” is necessary to gain public support for the organization’s agenda of suppressing economic growth in the United States, even though its July 15th press release claiming the Arctic Sea would be ice free by the year 2030 was an admittedly an untrue exagerration. Given the NAS report’s statement, “those who are unaware of or dismissive of accepted practices in a particular research field to manipulate data inappropriately,” a person has to wonder what these NAS report authors regard as “manipulating data appropriately.” Do they perhaps regard manipulating the methods and data by organizations such as Greenpeace or the IPCC to convey alarm to the public for the purpose of suppresing economic growth in the United States in its pursuit of combating AGW as “accepted practices in a particular research field”?

A fair question to ask is what exactly does the National Academy of Science (NAS) regard as “accepted practices” with regard to climate science? Is it perhaps now an accepted practice by the National Academy of Science to “manipulate data appropriately” for the purpose of implementing a national political policy regarding Climate Change, or does the NAS regard such an example as an effort to “manipulate data inappropriately”? Then there is the question of what NAS believes should happen when and if a peer in the field chooses to “manipulate data inappropriately” from the scientific point of view; and whose scientific point of view, and who qualifies as “peers who have the background to understand that information”? What happens when a “peer” in the field stubbornly demonstrates an inadequate “background to understand that information” and statistical methods, yet someone who has the necessary background from another field demonstrates the ability to “understand that information?” Does the NAS seriously propose to deny a peer such as NASA GISS or CRU access to data and methods in the event it is discovered such a peer/s “manipulate data inappropriately for scientific and/or political purposes? If so, how can the NAS propose to deny such access?

This NAS report creates far more questions than it appears to answer, or will the NAS clarify by addressing those questions?

well, the key policy recommendation is that data and methods should be publicly available. This seems well worth supporting.

You also provide some paragraphs that seem explanatory. I do not think there is anything in there that is exceptional. People can abuse data when they do not know what they are doing. People need to understand the data to be able to analyse it sensibly, and discover and correct errors.

I do not see anything in those paragraphs that specifies that detracts from a recommendation that data and methods should be publicly available. The observation that data can be abused by those who don’t know what they are doing seems mundane.

How can “People…abuse data when they do not know what they are doin” any worse than a very limited number of People who think they know what they are doing and withhold data and methods from all other People?

This requires that the methods and tools used to generate and manipulate the data be available to peers who have the background to understand that information

To me this is indicating the level to which the methods and tools are to be explained. The level of explanation should be sufficient that an “expert” or “peer” could understand them. There is no requirement that the explanation be sufficient to supply all the information that a layman would require.

A patent must be sufficient so that someone with “ordinary skill in the art” can reproduce the invention. Ordinary skill in the art has a legal definition that seems to be quite similar to the idea of “peer” in the quotation above.

You also provide some paragraphs that seem explanatory. I do not think there is anything in there that is exceptional. People can abuse data when they do not know what they are doing. People need to understand the data to be able to analyse it sensibly, and discover and correct errors.

1. data don’t bleed if you abuse it. Who exactly is harmed and how?
2. Who determines
a. that a person “understands” data he hasn’t seen.
b. what sensible analysis is and whether a third party CAN and WILL engage in “sensible” analysis

The attitude you express is a paternalistic one. Someone, the person who controls the data, becomes the judge of
who will receive it. We saw this with Steig who argued that “legitimate” scientists could have the data. That rule would
probably rule out the two jeffs ( lowly engineers) and steveMc and RomanM. Basically, you are turning a scientist
who understands science into a judge of other peoples abilities and motives. neither of which he has a demonstrated expertise in. If data and methods are open will some idiots get their hands on the data? of course. Will they do any harm? That is a hypothesis that is not supported by any data I know of. Is the harm, if real, curable? yes, if you believe in the process of rational discourse. At its heart the idea that people need to be protected from data, from knowledge, is a medieval trope that we destroyed long ago.

If data and methods are open will some idiots get their hands on the data? of course. Will they do any harm?

Of course! They could mislead weak minds who must be protected from charlatans who use it for their own purposes. ;)

I don’t know if anybody noticed, but the book format has a search feature called “Skim This Chapter”. It goes through a chapter showing the odd line indicting the content of the page. By using it, I found a reference to the Mann and the hockey stick fracas on page 62 (but Steve Mc is unnamed).

I would agree that per has gotten the gist of the two part message. I think though that we have to consider that message in terms of what we know about other statements of policy in these regards.

well, the key policy recommendation is that data and methods should be publicly available. This seems well worth supporting.

The statement would appear well worth supporting if indeed there is any meat in the policy statement that might in the end lead to something that can be enforced or influenced in a meaningful way by those making the policy statements. Certainly appeals with the flag, motherhood and apple pie approach and no intentions of going further should not be supported.

I do not see anything in those paragraphs that specifies that detracts from a recommendation that data and methods should be publicly available. The observation that data can be abused by those who don’t know what they are doing seems mundane.

If those comments were considered mundane, and not qualifiers for the previous ones by their author, why bother to make them? It would appear that the author(s) went out of their way to make a point that implies valid arguments and judgments can be made only rather exclusively by the peer as in peer review and is wagging a finger at the efforts that a thinking layperson might contribute and not a non thinking peer.

If the reference were to a specialized field of chemistry or physics, the thinking layperson may have difficulties making judgments about the works it produces (and even less of a chance of upsetting the practitioners if one attempted a non informed review, although I suspect a good statistician could make an informed judgment on the handling of the data), but much of climate science and the peer reviewed papers it produces do not approach that level of difficulty.

I think the positive side of the report is the following, stated as three principles (not just recommendatations). The recommendations all seem to be based on these “principles” and in their statement there are no limitations of the type that I pointed out earlier. That puts them on a different level providing a place from which to argue any application of the caveats in a public forum.

Data Integrity Principle: Ensuring the integrity of research data is essential for advancing scientific, engineering, and medical knowledge and for maintaining public trust in the research enterprise. Although other stakeholders in the research enterprise have important roles to play, researchers themselves are ultimately responsible for ensuring the integrity of research data.

Data Access and Sharing Principle: Research data, methods, and other information integral to publicly reported results should be publicly accessible.

Data Stewardship Principle: Research data should be retained to serve future uses. Data that may have long-term value should be documented, referenced, and indexed so that others can find and use them accurately and appropriately.

These are all items which we have been hoping would become established in climate science and I suspect that the presence of blogs such as CA have been at least partially instrumental in their appearance in this form. They could be useful tools for getting a more open CS community. I can already see relevance to “lost data” and “undisclosed methodology”, for example.

Why doesn’t the committee just simply adopt the data policies of a leading scientific journal – Nature for example? Seems to me that covers data disclosure pretty well, and reflects sound scientific practice.

Nature – Editorial Policy with respect to Availability of data and materials:

“An inherent principle of publication is that others should be able to replicate and build upon the authors’ published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols promptly available to readers without preconditions. Any restrictions on the availability of materials or information must be disclosed to the editors at the time of submission. Any restrictions must also be disclosed in the submitted manuscript, including details of how readers can obtain materials and information. If materials are to be distributed by a for-profit company, this should be stated in the paper.

Supporting data must be made available to editors and peer-reviewers at the time of submission for the purposes of evaluating the manuscript. Peer-reviewers may be asked to comment on the terms of access to materials, methods and/or data sets; Nature journals reserve the right to refuse publication in cases where authors do not provide adequate assurances that they can comply with the journal’s requirements for sharing materials.

After publication, readers who encounter refusal by the authors to comply with these policies should contact the chief editor of the journal (or the chief biology/chief physical sciences editors in the case of Nature). In cases where editors are unable to resolve a complaint, the journal may refer the matter to the authors’ funding institution and/or publish a formal statement of correction, attached online to the publication, stating that readers have been unable to obtain necessary materials to replicate the findings.

Details about how to share some specific materials, data and methods can be found in the sections below. The preferred way to share large data sets is via public repositories. Some of these repositories offer authors the option to host data associated with a manuscript confidentially, and provide anonymous access to peer-reviewers before public release. These repositories coordinate public release of the data with the journal’s publication date (advance online publication (AOP) or, if the manuscript is not published AOP, print/online publication). This option should be used when possible, but it is the authors’ responsibility to communicate with the repository to ensure that public release is made promptly on the journal’s AOP (or print/online) publication date. Any supporting data sets for which there is no public repository must be made available as Supplementary Information files that will be freely accessible on nature.com upon publication. In cases where it is technically impossible for such files to be provided to the journal, the authors must make the data available to editors and peer-reviewers at submission, and directly upon request to any reader on and after the publication date, the author providing a URL or other unique identifier in the manuscript.”

Given those policies, how can there be a problem, at least with papers published in Nature?

So much for your reading of everyone’s experience and Trouble with Nature. Alfred Hitchcock could have had fun with that title. I can just see Hitchcock’s screenplay with the sleuth checking the basement at CRU looking for clues to the missing data.

Data Access and Sharing Principle: Research data, methods, and other information integral to publicly reported results should be publicly accessible.

Data Stewardship Principle: Research data should be retained to serve future uses. Data that may have long-term value should be documented, referenced, and indexed so that others can find and use them accurately and appropriately.

I have real doubts about the use of the word “should”:

Should can describe an ideal behaviour or occurrence and imparts a normative meaning to the sentence; for example, “You should never lie” means roughly, “If you always behaved perfectly, you would never lie”; and “If this works, you should not feel a thing” means roughly, “I hope this will work. If it does, you will not feel a thing.”

Should has lots of “wiggle room”: “It should happen but, what the heck …”
To me this is different from the formal use of the word “shall”:

Shall is also used in legal and engineering language to write firm laws and specifications as in these examples: “Those convicted of violating this law shall be imprisoned for a term of not less than three years nor more than seven years,” and “The electronics assembly shall be able to operate within its specifications over a temperature range of 0 degrees Celsius to 70 degrees Celsius.

(Both quotes from “English modal verb” Wikipedia.)

We need a clear understanding of how the word “should” shall be understood. Otherwise, we would not know what they mean!

The IETF in RFC2119 makes these definitions. Should does have some wiggle room

1. MUST This word, or the terms “REQUIRED” or “SHALL”, mean that the
definition is an absolute requirement of the specification.

2. MUST NOT This phrase, or the phrase “SHALL NOT”, mean that the
definition is an absolute prohibition of the specification.

3. SHOULD This word, or the adjective “RECOMMENDED”, mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

4. SHOULD NOT This phrase, or the phrase “NOT RECOMMENDED” mean that
there may exist valid reasons in particular circumstances when the
particular behavior is acceptable or even useful, but the full
implications should be understood and the case carefully weighed
before implementing any behavior described with this label.

This requires that the methods and tools used to generate and manipulate the data be available to peers who have the background to understand that information

To me this is indicating the level to which the methods and tools are to be explained.

.

I think TAG’s point is well made. Most importantly, I do not see this as restricting to whom the information should be available, nor do I see it as reasonable to do so.

Re: Steven Mosher

The attitude you express is a paternalistic one.

I can only suggest you read what I wrote. The recommendation is that data should be publicly available; which I support, as do you. I have not suggested that there will be harm if data is analysed badly; but I do agree with what you wrote (and what the NAS wrote) which is that it is likely that some people will analyse data badly if they get their hands on it. For my part, I think that is part of the price that must be paid.

no one in chemistry has ever even suggested that, say, the laboratory synthesis of some dangerous (read “addictive”, etc.) compound be kept out of public access in the publication of the synthesis because this information could be “manipulated inappropriately”.

This is government policy in the UK, and I am certain in the USA. I know a paper was pulled from PNAS, because it might give terrorists some good ideas. Anyone publishing a quick easy synthesis of VX would be crucified.

I would also point out that some are ignoring the issue that non-electronic data is also covered by some of these statements. There are real difficulties with making some of that stuff available, archiving, etc. NAS’s remit is wider than MBH.

I tend to disagree somewhat on this. NAS does not have the sufficient authority to use these words. The best they can do is raise a moral imperative whose strongest language is pretty much “should”, not “will”,”shall” or “must”.

I tend to disagree somewhat on this. NAS does not have the sufficient authority to use these words. The best they can do is raise a moral imperative whose strongest language is pretty much “should”, not “will”,”shall” or “must”.

I agree with Roman, it’s about as strong as they could be expected to go.
.

(Interestingly, the Rules of Golf say that a player “should” mark their ball in any situation where it is legally lifted. However that is actually only a recommendation. If it was a requirement they would have said “must”. So you don’t have to mark your ball (unless on the green), but it’s a very good idea if it’s an important event.)

I tend to disagree somewhat on this. NAS does not have the sufficient authority to use these words. The best they can do is raise a moral imperative whose strongest language is pretty much “should”, not “will”,”shall” or “must”.

More the reason to suspect that that organization is merely talking the talk when they knowingly are not in a position to walk the walk. This whole episode may be a big to do about little or nothing.

Please wake me up when the need for transparency is finally realized and somebody in a position to do something about it is doing something concrete about it.

but I do agree with what you wrote (and what the NAS wrote) which is that it is likely that some people will analyse data badly if they get their hands on it.

I don’t understand the reason for the NAS committee raising the issue unless it is something to be avoided – presumably a consideration when a decision is being made as to whether the data should be given to a particular person or group of people.

I would also point out that some are ignoring the issue that non-electronic data is also covered by some of these statements. There are real difficulties with making some of that stuff available, archiving, etc.

NAS seems to have considered that issue to some extent in the statement See the quote in my initial post: “at no more than the cost of fulfilling a user request”. If the data is in an inconvenient format, the actual cost of transmitting it could reasonably be charged to the transmittee.

Re: D. Patterson (#27), I believe we can honestly say that many scientists are slobs or absent minded. In many cases their work does not rise to the level of importance that anyone ever wants to check their work. But, as has been stated here many times, if you are going to claim the sky is falling, it might be good to have your data archived. Just saying.

With the growing importance of research results to certain areas of public policy, the rapid increase of interdisciplinary research that involves integration of data from different disciplines, and other trends, it is important for fields of research to examine their standards and practices regarding data and to make these explicit.

IMHO, it also implies that if they use statistical methods, they should be specific in disclosing the methodology.

This requires that the methods and tools used to generate and manipulate the data be available to peers who have the background to understand that information.

This merely says that the metadata should describe the methods and tools which are used. So it is enough to say that a certain version of a statistical software package was used, which functions were used, and provide the parameters. The available to peers phrasing means that you should state that you’re using “R” and what version, but you don’t have to provide an “Introduction to R” manual. The methods phrasing means that it is not sufficient to say that you used “R”, that you also should describe the data manipulation in enough detail for a peer to duplicate your work.

the temperature data used as proxies in the GCM are highly sensitive to initial errors

I think this is more an advocacy position than a scientific one, and you are certainly exaggerating the degree to which climate temperature proxy data is garbage. Showing that temperature data is as unreliable as you suggest should not be all that difficult (if you are right) and as you note proving this would demolish a good deal of the current state of climate science affairs. There’s a Nobel prize waiting for you, so do it.

You are stringing together several reasonable concerns about contamination of the data but coming to a very dubious conclusion that temp data qualifies as “garbage in”. This is my big concern about Climate Audit experience, which exaggerates the significance of certain data problems that are unlikely to affect an objective conclusion. There will never be pristine data sets in this field, so the relevant question is usually f *the degree to which data problems interfere with valid conclusions* and not “is this data perfect?”. This is RealClimate’s (valid) point when they (unfairly) dismiss data issues brought up over here.

A large body of research and the majority of climate scientists believe that although the data is flawed it is indeed a reasonable proxy for global temperatures. Based on the research consistency and the widespread consensus on this I see little reason to dispute that notion simply because a few temperature stations are in parking lots.

you are certainly exaggerating the degree to which climate temperature proxy data is garbage.

I don’t think so. Take UHI. The main point of the existing temperature data is that it shows a rising temperature in the past century. And a particularly rapid increase since about 1980. But when the stations used to create these results are examined, they are found to be primarily located in areas where the degree of urbanization is increasing. This includes things like siting at Airports where growth is particularly rapid. It includes small towns where the general area may still be rural but the immediate area near the weather station has grown. It includes closing of truly rural stations for budget purposes; resulting in a less rural mix of stations. It includes ‘homogenization’ of the data in rural stations with less rural stations within grid cells.

Since the total temperature increase over the past century is only a degree C, it doesn’t require a very large UHI contamination of the dataset to substantially degrade its value as a temperature proxy.

Now, contrary to your claim, it’s not the case that

Showing that temperature data is as unreliable as you suggest should not be all that difficult

We’re dealing with a fraction of a degree C. And this is where we get to the subject at hand. In order to show that there’s a fraction of a deg C error in the data, the data used must (and I repeat MUST) be available and what’s been done to produce the dataset actually released must be given so that the determination of whether or not the dataset can be relied on can be done.

Now it would be no problem for any number of people here to throw together datasets which would show the opposite results of the AGW crowd. But A) it wouldn’t be published and B) even the people here would agree it shouldn’t be published because there’d be no particular reason to accept the assumptions made as correct. But why shouldn’t the AGW crowd have to prove their assumptions by releasing their datasets and methods? Now it’s true that many here (myself included but not our genial host) suspect the existing datasets are too badly flawed to use for the purpose they’re being used for. But we’re willing to be convinced otherwise provided the other “side” do due diligence and release their data and methods.

Re: Joe Hunkins (#31), It is not quite GIGO, but several lines of evidence (e.g., Pielke sr many papers including recent one with both father & son on it and Phil Jones paper 1 yr ago on UHI in China) suggest that up to half the estimated warming on land over the past 100 yrs could be due to instrumental issues, including uncorrected UHI. Not quite garbage, but if half the warming is not really there and the models are calibrated on the historical data…not really trivial either.

the temperature data used as proxies in the GCM are highly sensitive to initial errors

I think this is more an advocacy position than a scientific one, and you are certainly exaggerating the degree to which climate temperature proxy data is garbage. Showing that temperature data is as unreliable as you suggest should not be all that difficult (if you are right) and as you note proving this would demolish a good deal of the current state of climate science affairs. There’s a Nobel prize waiting for you, so do it.

Your faith is misplaced. The Nobel Prize is a highly politicized affair with trappings of science with respect to the awards for scientific achievements. My wife’s cousin, Cordell Hull, received one for his peace efforts. It’s quite remarkable when you hold it. The gold content makes it quite heavy and very nice to look at. Anthony Watts and collaborators have demonstrated beyond question that the USHCN network is horribly flawed with physical errors, and I certainly do not expect to see them rewarded with any Nobel prizes for their accomplishments in demonstrating the data to be significantly unreliable. I’ve personally witnessed enough corruptions of the reported observational record to know for a fact some of it has been significantly falsified.

I had a commander of a fighter squadron visibly upset and demanding to know why his fighter squadron could not launch on a mission because the weather conditons were below legal minimums on the airfield. He pointed across to the co-located commercial airline traffic on the other side of thei field, who were not being similarly grounded. We had to show this commander how the FAA was using the same instrumentation we were using, and they were misreporting the weather conditions to allow the air traffic operations to continue despite less than minimum visibility, dew point, and air temperature conditions.

Likewise, if you talk to the same coop station operators I have known over the years, open your ears and mind, you may learn just how bad the reporting can get.

Nonetheless, there are always going to be people who are going to jump to an unsupported assumption and conclusion that “the experts simply MUST know what they are doing” and the problem MUST be too miniscule to have any significant impact on the big picture. Yes, well good luck trying to prove that, given the systematic experience of having the raw data, adjusted data, and methods withheld and obfuscated. There are many of us with years of experience in meteorology, climatology, climate science, and atmospheric sciences who are being ignored, demeaned, and marginalized by any available means to silence our voices and persuade the public to close their ears to our voices.

You would do well for yourself by listening carefully and not jumping to any faith based conclusions. There have been many posts by engineers on this blog who got it right when they reported how utterly misguided some of these academics can get with their pet ideas when it comes to applying them to the hard truths of reality. One of my former supervisors was a WWII pioneer in the Air Weather Service, expert hurricane forecaster, and a professor at a Florida University who sometimes despaired over the way environmental activists entering his profession were destroying its scientific integrity and effectiveness. His concerns have proven to be well justified. If you take the trouble to look, read, and understand the last several years of posts on this blog, you can find a plethora of hard evidence demonstrating how the temperature record uncertainty is regarded by many of the foremost experts in the discipline to be significantly greater than the tolerances of the instruments used to make the observations. So, you don’t need my expertise, when you have numerous scientists who literally wrote the textbooks in climate science also reporting the inadequacies of the temperature data and observation networks.

Then you also have the opportunity to use some of that commonsense you also mentioned. For one example, take the published guidelines for the siting of the temperature measurement instruments, and determine the volume of air encompassed within the permitted site circumfrence free of outside influences. Take the number of observational stations and multiply that number by the volume of air deemed to be representative for a single observation site. Take this total global volume of air being sampled at these observation sites and compare to the total volume of air within the global atmosphere. Ask yourself what is the percentage difference in volume, and then ask yourself what the percentage difference is when adjusted for actual atmospheric mass of the sample volume versus the total mass of the atmosphere. Once you have those numbers, you can begin to see just how significant or insignificant the sampled air masses are in the real world.

Next, take your total sampled air mass and separate that mass into the fraction sampled within the PBL (Planetary Boundary Layer) and the fraction sampled above the PBL. Then determine how the temperature trends compare between the PBL air mass and the above PBL air mass. In the Antarctic, the altitude of the interior of the continent puts the surface into the upper atmosphere with respect to the coastal stations. Consequently, there is typically a very strong inversion layer meters to tens of meters thick on the continent’s interior versus hundreds of meters thick at the coastal stations. The air mass above the inversion layer is typically much warmer than the air mass at the surface. Observed temperatures, therefore, can vary significantly, if and when, weather conditions disturb this inversion layer. Likewise in other parts of the world, the paucity of surface stations and accurate records in most of the world and even fewer upper air soundings to report thermal conditions above the PBL results in wide swaths of the Earth where thermal data is little more than a WAG (Wild Eyed Guess).

You are stringing together several reasonable concerns about contamination of the data but coming to a very dubious conclusion that temp data qualifies as “garbage in”. This is my big concern about Climate Audit experience, which exaggerates the significance of certain data problems that are unlikely to affect an objective conclusion. There will never be pristine data sets in this field, so the relevant question is usually f *the degree to which data problems interfere with valid conclusions* and not “is this data perfect?”. This is RealClimate’s (valid) point when they (unfairly) dismiss data issues brought up over here.

That’s a strawman argument. No one here is asking for “pristine data sets” at all. If RealClimate is using such an argument, you can judge for yourself the reliability of someone who would choose to use such a strawman argument and false attribution.

A large body of research and the majority of climate scientists believe that although the data is flawed it is indeed a reasonable proxy for global temperatures. Based on the research consistency and the widespread consensus on this I see little reason to dispute that notion simply because a few temperature stations are in parking lots.

Despite the “widespread” reports of consensus, the consensus of the meteorologists et al I know dispute the AGW claims. Of course, AGW/GlobalWarming/ClimateChnage or next flavor of the month proponents deny the existence of a consensus or any significant non-kook group of scientists in opposition to their claimed consensus, but that has more to do with coercive politics and not the science we are trying to unravel. If you want to persuade or impress us, try using some reproducible science and not your faith in some majority belief. It only takes one person to prove everyone else has a wrong belief. REMEMBER PHLOSTIGEN!

Re: Joe Hunkins (#31), It’s been shown in the literature that very small errors in initial conditions cause GCM outputs to become incoherent, in perfect model tests. The incoherence with test climates is complete within one projection year. So, D. Patterson is essentially right in post #21.

The big problem with GCM projections is that none of the measurement uncertainties are propagated through a projection to give us a physical uncertainty in the prediction — the reliability of the projection, in other words.

Furthermore, digital technologies can tempt those who are unaware of or dismissive of accepted practices in a particular research field to manipulate data inappropriately.

Does anyone else see this as a subtle reference to science blogs who are challenging “accepted practices in a particular research field?”

This blog has uncovered evidence that suggests data is being manipulated, ignored, improperly entered, and tortured by “experts” until it tells them what they want to hear. It exists to combat “inappropriate manipulation of data.”

The title, “Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age,” doesn’t appear to me to ensure anything, other than the status quo in climate science, which, given the stakes involved, makes it useless.

There has to be a better way for the data and methodologies that underly the seminal papers in AGW climatology to be reviewed for accuracy and scientific integrity.

WRT to the “peer” issue. When I started coming to CA I knew nothing about the temperature record. Hansen could well have said ( looking at my resume and lack of publication) that I was not a peer and could not understand the data. he could then have withheld the data and the methods. He would have been wrong. What would my recourse be? bring a lawsuit? beg? hack ( hehe)?

The idea that the “public” will abuse data has been used to resist releasing raw clinical trial data. That a patient died might be highlighted as proving the medicine is lethal when only aggregate info is valid would be an example. The other side of that coin is that public policy has in the past been based on data that were bogus (e.g., educational methods, mental health treatments, health food claims, etc) which would have benefitted from some sunshine. It is a messy business, but openness is still the best cure.

The NAS, the Royal Society… aren’t scientific institutions getting into a sad state? I imagine that every advance in data analysis has consisted of introducing methods that are not “accepted practices”.

There are 11 recommendations NAS puts forward, see them in outline at tAV. What concerns me is #5 which says

All researchers should make research data, methods, and other information integral to their publicly reported results publicly accessible in a timely manner to allow verification of published findings and to enable other researchers to build on published results, except in unusual cases in which there are compelling reasons for not releasing data. In these cases, researchers should explain in a publicly accessible manner why the data are being withheld from release.

and #8 which says

Research institutions should establish clear policies regarding the management of and access to research data and ensure that these policies are communicated to researchers. Institutional policies should cover the mutual responsibilities of researchers and the institution in cases in which access to data is requested or demanded by outside organizations or individuals.

(my highlighting). The principle here is that research directly impinging on global policy MUST be transparent to inspection by anyone with adequate statistical skills.

The principle here is that research directly impinging on global policy MUST be transparent to inspection by anyone with adequate statistical skills.

I think you and they are barking up the wrong tree. The implied assumption in this is that unskilled idiots must be denied the data so that they can’t gum up the works with incorrect conclusions. This is plainly wrong. Unskilled idiots are extremely skilled at gumming up the works, and if they don’t do it with improperly analyzed data, they’ll do it with crackpot theories. I also don’t see much evidence that people with the patience to work with these datasets are very often unskilled.

If the argument is that keeping the data out of the hands of crackpots will help the public discussion, it’s specious. What they don’t seem to understand is that it actually has the exact opposite effect; it makes them appear to be circling their wagons, and hiding something.

What this reminds me of is someone in court representing himself pro-se. The lawyers don’t like it, the judges don’t like it, and it makes everyone’s job more difficult, but they understand that they have to allow it, because not allowing it would create bigger problems. So they grin and bear it, and 99% of the time, the pro-se litigant trips on his own undies.

The NAS is new at this kind of thing, and hasn’t thought it through all of the way. They need to go back and think about it some more.

Re: Calvin Ball (#47), Calvin, so sorry! my bad wording! I agree with you. Shouldn’t have been so ready to hit the “send” key before going out of reach of computers. I meant, it seems to me that these are the paragraphs to beware of, because they suggest closing the doors to outside inspection by folk like the readership of CA, some of whom are more than adequate to the statistics, some like me are not, but understand that we need openness, not closed doors.

Re: Lucy Skywalker (#84), And for whatever this is worth, it was the warming activists who opened up this can of worms by allowing a non-scientist with virtually no scientific depth whatsoever (Gore) to masquerade as a scientific expert. I don’t seem to recall hearing anywhere near as much of a hue and cry from the climate science community over his dilettante escapades. So I don’t think they’re in much of a position now to start Manning the gates against the hoards of statisticians and engineers saying “ur doin it rong”.

Some institutions gather climate data for purposes other than climate research. For example, the military machine might be interested in conditions at airports as they affect aircraft load ability etc.

There is no compelling reason to make some military data public, so it is reasonable to include exceptions to full and open release criteria.

The problem is more one of where to draw the line. An institution such as CRU at East Anglia is clearly out of step. But what of NASA, who would have both civil and military data. Are they obliged to reveal significant observational differences? I suggest not, though it might be nice if they did.

For me (and now I’m not including any political quotes–thanks for cluing me in and my apologies for my ignorance) this relates to “book burning” and such. Yes, we say we are all for “public disclosure” of scientific data and methods, but then we (NAS) give ourselves an out with the “available to peers” phrase: i.e., we reserve the right to “burn the books” if we think that “science is better off if you don’t have access”. Paternalistic, as mentioned above, and cowardly to boot. Next, “…manipulate data inappropriately.” C’mon! Again, more paternalism and cowardice. I LOVE (irony) that word “inappropriately”. Yes, go ahead and hide behind it. My field is chemistry, and (to the best of my knowledge) no one in chemistry has ever even suggested that, say, the laboratory synthesis of some dangerous (read “addictive”, etc.) compound be kept out of public access in the publication of the synthesis because this information could be “manipulated inappropriately”. I don’t have a good feeling about all this, despite the fact that much/most of it sounds excellent.

Yes george, I allude to this in my comments. But we need not complete the analogy. I’m merely pointing out a similarity of argumentation. When governments or scientists or whoever argue that your access to knowledge will create conditions where you can cause damage, they are essentially making a “we know better” argument. That pattern of argument has been around for AGES, used by many powers that be to maintain their power. Knowledge is power. Also, inherent in this argument is that you “lay people” need to be protected from yourselves or others who would take advantage of you. You don’t know what is best for you. The holders of knowledge (power) on the other hand, can be trusted to give it out to those who are fit vessels. Funny, As a layperson I don’t know enough to protect myself from myself or others, but I do know enough to trust the authorities. hehe.
There are other arguments that climate science has appropriated. As I’ve pointed out repeatedly the precautionary principle is nothing more than Pascals wager in climate science clothes.

As I was reminded recently at another blog, “Only error, and not truth, shrinks from inquiry.” The NAS document is a classic case of pure, unadulterated doublespeak, the evidence for which are the above 40-something comments trying to figure out what it really means.

As others have noted, when you don’t have authority to issue orders then you produce recommendations or guidelines or policies. That is why “should” and “may” are invoked.

The NAS Report may (yep, may) strengthen one’s case but it won’t blast data and methods from anyone determined to withhold them.

Publications can do more. If they really want to.

Each publication, not the authors, can add a comment about any significant article. It would outline what supporting data is available without restrictions. And where. If there is none then simply say so and let the reader know.

Furthermore, digital technologies can tempt those who are unaware of or dismissive of accepted practices in a particular research field to manipulate data inappropriately.

And how many times has a team paper been critiqued because climate scientists have shown a lack of understanding about statistics, or are “dismissive of accepted practices in a particular research field”?

If data and methods are only accessible to people the scientist believes to be trustworthy, then by definition that is not “publically accessible”.

I cannot believe that the National Academy of Sciences would actually put such stupid and controversial caveats into the data accessibility criteria but my capacity for being shocked by scientific bodies making anti-scientific policy appears be reducing all the time.

I am only a mere physicist, unaware of the nuances of data sets in climatology and their statistical analysis. Nevertheless, I am worried by the suggestion (and perhaps more) that data should be withheld. The “public” does not understand quantum field theory, but does that mean the experimental data and theoretical analysis should not be available to all. Of course not. The strength of science lies in its openness and democracy. Yes, there are nutters out there – and folks with agendas – who will twist and misrepresent. But so what – just learn to live with them.

Exactly. People who don’t understand the methods used in a field can’t do any harm, except maybe to themselves by demonstrating their lack of understanding. I recall one case that I was involved in where someone outside my field wrote (and got published) an article saying that everyone in my field was wrong on some point. The response was that people wrote comments pointing out the errors in the criticism, the author agreed he was wrong and everything was forgotten (all I remember about it now is the author’s name).

Re: C Baxter (#70), I think some context needs to be applied to climate science. Normally, say in chemsitry or physics, any result can ultimately be tested by somebody else saying ‘stuff it I’ll do it myself’. IN fact this is the best way to test it. Don’t use someone else’s code, don’t follow their exact method. Define a method that is clear and logical enough and follow that and see where it leads.
The peculiarity with climate science is that a lot of the data is unique. You can’t repeat grow a tree ring. Or an ice core. Now you could go and get some more but you get the picture. The obvious question that follows is that if so then whay aren’t you more open, in fact so open that anybody off the street if they wished could find out how to make an ice core tell you the temperature by using your methods. Especially as it supposed to be really really important. Any logical thinker would assume that this happens. But human nature is not always logical, hence we have people burying their correlations in technical speak and requiring NAS retina scans to see how the machine works, rather than straight forward here’s how you do it, here’s the data because your money paid for it. You might get bored but that’s not my problem. Enjoy yourself.
As I write this I’m reminded of the Emporer’s New Clothes. Funny that.

I think a major point is being missed in this discussion. The authors have been working on this for over 2 years, and the best they can come up with are recommendations? Really? I can come up with recommendations in less than a day. After a lot of time (2 years) and a lot of money (god knows), the authors should have come up with concrete guidelines for everybody. What the heck were they doing besides wasting time and money? The authors should be hammered for incompetence. This report is meaningless, and just shows everyone how toothless and incompetent the NAS has been on this issue. I would like to add that the staff at the NAS have been very gracious and polite, and they (staff) have answered all of my emails in a timely fashion- it is the authors I have a beef with.

As Roman pointed out in #56, the NAS really doesn’t have the authority to force any agency to do anything. They make recommendations to those organizations that they review. About the only power they hold is that when an organization is reviewed, they can comment on how well the organization is upholding the standard set by the NAS.

Think of it this way… the USDA & HHS have created a recommendation on what we eat called the “2010 Dietary Guideline”. They release it, publicize it and can report on how well we are following it, but they cannot they force us to live by it. They don’t have that authority. The NAS is in the same boat. They can release a recommendation, publicize it, and report on how well groups/agencies are following it, but they don’t control the agencies so they can’t force any of them to comply.

Most agencies have a multitude of groups that review them. Very few (if any) of these groups have any authority over the agency being reviewed. The reason is that some of these groups have recommendations that are flat-out opposites of each other and it would be impossible to comply. Therefore, all such organizations can only submit recommendations.

This brings to mind Richard Feynman’s reason for resigning from the NAS:

I had trouble when I became a member of the National Academy of Sciences, and I had ultimately to resign because here was another organization which spent most of its time choosing who was “illustrious enough” to join.

There’s an interesting article in Scientific American titled Fossils for All: Science Suffers by Hoarding. The article highlights paleontology, not climate science, though existence of paleoclimatology makes an amusing coincidence. The same issues are at play. (I won’t quote too much, considering the snotty behavior of Scientific American when Bjørn Lomborg quoted them to defend himself from one of their hit pieces.)

The scientists who expend the blood, sweat and tears to unearth the remnants of humanity’s past deserve first crack at describing and analyzing them. But there should be clear limits on this period of exclusivity. Otherwise, the self-correcting aspect of science is impeded: outside researchers can neither reproduce the discovery team’s findings nor test new hypotheses.

In 2005 the National Science Foundation took steps toward setting limits, requiring grant applicants to include a plan for making specimens and data collected using NSF money available to other researchers within a specified time frame. But paleoanthropologists assert that nothing has really changed.

Hmmm… this sounds familiar.

Ultimately, the adoption of open-access practices will depend in large part on paleoanthropologists themselves and the institutions that store human fossils—most of which originate outside the U.S.—doing the right thing. But the NSF, which currently considers failure to make data accessible just one factor in deciding whether to fund a researcher again, should take a firmer stance on the issue and reject without exception those repeat applicants who do not follow the access rules.

Lots of words and rules, which are meaningless unless they are enforced.

As for the public display of these fragments of our shared heritage, surely taxpayers, who finance much of this research, deserve an occasional glimpse of them.

I think that the public display of the climate data and code would be nice.

I’ve tried posting this twice and it apparently gets bounced by the spam filter so I’m cutting out the links to the all of the weblinks for the funding. If this gets through I’ll try to post them.

While the whole UK FOI action against CRU was an interesting experiment in how the UK treats FOI requests and embarrassed CRU for not having data traceability, it did really resolve getting the data. To get resolution to this, you have to follow the money. The amount of government funding to climate scientists is staggering.

Phil Jones (University of East Anglia or UEA) has been funded by the U.S. Department of Energy (DOE) Office of Biological & Environmental Research (BER) since 1993; often with Tom Wigley. [Note, that there is also another Phil Jones at Los Alamos National Laboratory (LANL) who receive funding for climate change research; this is not the same Phil Jones (UEA).] Partial details of the grants and funding are provided at the bottom of this post and greater details on the grants are on the webpages cited [note that they have been deleted to try to get past the spam filter]. This includes the DOE grant that Steve M. kept asking about: DE-FG02-98ER62601.

As Phil Jones (UEA) has been funded by these grants he is subject to the data sharing policy of the DOE granting office and division. The DOE BER Climate Change Research Division Climate Change Prediction Program at http://www.sc.doe.gov/ober/CCRD/model.html only states: “Funding of projects by the program is contingent on adherence to the BER data sharing policy.”

Program Data Policy
The program considers all data collected using program funds, all results of any analysis or synthesis of information using program funds, and all model algorithms and codes developed with program funding to be “program data”. Open sharing of all program data among researchers (and with the interested public) is critical to advancing the program’s mission.
Specific terms of the program’s data sharing policy are: (1) following publication of research results, a copy of underlying data and a clear description of the method(s) of data analysis must be provided to any requester in a timely way; (2) following publication of modeling methods or results, a copy of model code, parameter values, and/or any input dataset(s) must be provided to any requester in a timely way; and (3) recognition of program data sources, either through co-authorship or acknowledgments within publications and presentations, is required.
The program assumes that costs for sharing data are nominal and are built into each grant application or field work proposal. In cases where costs of sharing are not nominal, the burden of costs will be assumed by the requester. The Program Manager should be informed whenever a requester is expected to pay for the costs of obtaining program data, whenever a data request is thought to be unreasonable, and whenever requested program data is undelivered.

Funding of projects by the program is contingent on adherence to this data sharing policy.

Several things pop out in this policy:

“Open sharing of all program data among researchers (and with the interested public) is critical to advancing the program’s mission”

“a copy of underlying data and a clear description of the method(s) of data analysis must be provided to any requester in a timely way”

“following publication of modeling methods or results, a copy of model code, parameter values, and/or any input dataset(s) must be provided to any requester in a timely way;”

“The program assumes that costs for sharing data are nominal and are built into each grant application”

This says nothing about sharing with “academics only”. Sharing with the “interested public” is clearly specified. As Phil Jones (UEA) has published multiple research and modeling results, I’m sure a polite request to Dr. Jones (UEA) from a member of the “interested public” for a copy of underlying data and a clear description of the method(s) of data analysis and a copy of model code, parameter values, and/or any input dataset(s) that are the results of DOE funding would be provided to any requester in a timely way as is required by the DOE data sharing policy. After all, the US taxpayers paid for this work and the interested public should be able to get this info.

If the requested information is “undelivered” then a polite request to the head of Climate Change Prediction Program:

would probably be in order. Perhaps a query about the quality assurance standards for these grants, the traceability of data requirements for these grants, and whether continued funding would be provided to principal investigators who do not comply with data requests could also be made.

If that does work, then a FOIA request to DOE regarding data availability, quality assurance standards, and the traceability of the data might get results. The DOE takes FOIA requests seriously. FOIA worked when Steve M. asked for info from Santer at the DOE lab.

If there are problems with this, then DOE would be open to FOIA requests as to why their Principal Investigators are not living up to data sharing agreements; why the DOE is not enforcing data sharing agreements; why DOE is funding individuals who do not abide by the DOE data sharing agreements; why the data gathered under the DOE program are not traceable; and if the data are not traceable, then what good are the data.