A Partial Victory for the R Philosophy

Obviously I think that R is a great language. But one of the reasons that it’s great is because it’s open source and because of the incredible energy and ingenuity of the packages contributed by the R Community for the use of others.

In a real sense (as opposed to a realsense), this sort of open source philosophy represents what a lot of us thought that climate science would be like (and should be like). I have a story today which shows a small victory for open source philosophy.

As a preamble, I don’t think that any of us were really prepared for stonewalling by leading members of the “Community”, such as Phil Jones:

We have 25 or so years invested in the work. Why should I make the data available to you, when your aim is to try and find something wrong with it. There is IPR to consider.

Most members of the public expected an open source philosophy – the attitude of the R Community as opposed to the attitude of the “Community” [of climate scientists] and particularly the “Team”.

None of us could have expected Michael Mann’s remarkable responses, first to a reporter from the Wall St Journal:

Dr. Mann refuses to release [the source code]. “Giving them the algorithm would be giving in to the intimidation tactics that these people are engaged in,” he says

Or later his truly extraordinary response to a request for his source code from the House Energy and Commerce Committee, who were nonplussed at the above answer. The Committee asked Mann:

Q5: according to The Wall Street Journal, you have declined to release the exact computer code you used to generate your results… (a) Is that correct? (b) What policy on sharing research and methods do you follow? (c) What is the source of that policy? (d) Provide this exact computer code used to generate your results.

Mann’s answers are well worth re-reading. Mann refers to “intellectual property” no less than seven times in his answer, a few of which are quoted here:

My computer program is a piece of private, intellectual property, as the National Science Foundation and its lawyers recognize. It is a bedrock principle of American law that the government may not take private property “without [a] public use,” and “without just compensation.” …

It also bears emphasis that my computer program is a private piece of intellectual property, as the National Science Foundation and its lawyers recognized….

under long-standing Foundation policy, the computer codes referred to by The Wall Street Journal are considered the intellectual property of researchers and are not subject to disclosure.

Even more recently, the National Science Foundation confirmed its view that my computer codes are my intellectual property.

Actually, there’s an interesting argument that the computer code was not actually Mann’s private property but the property of his employer at the time (University of Massachusetts, I presume) – the NSF letter merely said that they did not claim title to the code, but did not opine on Mann’s ownership rights as opposed to those of the university. In a law school examination sense, Mann’s assertion of personal ownership of the code is arguably an assertion of a right inconsistent with the right of the true owner and thereby a tort (of conversion.) See CA discussion here.

Notwithstanding the above statements, on July 18, 2005, Gavin Schmidt stated that the “data and computer code are in the public domain”:

These responses emphasise two main points that we have explained in great detail in earlier postings [linking to a Feb 18, 2005 thread] on this site:
1. There is no case for casting doubt on the scientific value and integrity of the studies by Mann et al. – they have been replicated by other scientists, the data and the computer code are available in the public domain (including the actual fortran program used to implement the MBH98 procedure) …

If the data and computer code had always been “available in the public domain, including the actual fortran program used to implement MBH98”, it’s hard to figure out why Mann would have made such an issue about it. And, of course, the computer code hadn’t always been in the public domain. It had been put online a few days earlier (July 12, 2005) in response to the Committee request. On other occasions when data has been produced after a long campaign, we’ve seen the same pattern of pretending that it had always been there. Gavin Schmidt pulled a similar stunt in connection with non-infilled data for Mann et al 2008. (And BTW, the code wasn’t complete or workable with available data. No one knows to this day how confidence intervals were calculated or principal components were retained.)

Anyway, back to today’s story. A few day’s ago, I posted on coral calcification, an issue to which I had been referred by Peter Ridd of James Cook University in Australia. Although the article involved linear mixed effects modeling using lmer (a technique with which I’m familiar), this is a relatively complicated methodology and you really need to see the equations (i.e. code) to see what people are doing. I read the SI and couldn’t decode it. With enough time and effort, I might have been able to, but, realistically, if I couldn’t decode it right away – and this was a program that I happened to have experimented with – how many paleoclimate scientists could even get a foothold on it.

Following my post, Peter Ridd sent an email to the author group requesting actual source code. Peter Doherty gave an answer more or less in keeping with Mann’s “private property” attitude, though couched more pleasantly:

I won’t ask Glenn to provide his R code because I don’t believe that it is public property. If you think that he got the analysis wrong, you have the ability to reanalyse the exact same data set and publish an alternative interpretation of it. I’m sorry that this doesn’t meet your latest request but I hope that our actions to date are consistent with your high expectations of scientific probity.

Peter Ridd forwarded this to me. Needless to say, the assertion “private property” was a red flag for me. In this case however, adding insult to injury, De’ath had written his programs in R, the quintessential open source language, and, whatever the standards of the [climate science] “Community”, the stance was at odds with that of the R Community. So I wrote:

Dear Dr Doherty,

I am familiar with the lme4 package.

I consulted the SI to De’ath et al and, from that information, I am unable to determine how the lmer models were constructed. Could you therefore please provide a substantially expanded description of your methodology for the construction of your lmer models that is sufficient to permit replication. It would probably be just as easy to provide code.

Aside from whatever obligations you may have under Australian policies or journal policies of Science, R is a an open-source language. Your attitude here is very much in opposition to the R philosophy. You were willing enough to use open source software and your unwillingness to reciprocate will undoubtedly be viewed unfavorably by this community.

Thank you for your attention,

Stephen McIntyre

Shortly thereafter I received a squib of code (uploaded to CA here) providing the code for the response to a comment by Ridd et al (but not for the original paper), which I’ll discuss on another occasion. De’ath said:

Not sure what your problem is:

From the SOM: “The final models for calcification, extension and density were fitted using the R package mgcv and partial effects plots were used to illustrate the effects of year and SST on calcification, extension and density. For calcification, extension and density, the temporal trends were selected with ~9 df.”

The code for the comparison in the rebutal of Ridd et al (with the final years omitted etc) is in the zip and the data are in the .RData.
It also produces the graphics.
All very straightforward really.

Stay cool — all’s well in the world of R & Ubuntu 🙂

I liked the closing. I’ve had limited success in getting access to code – with the two main exceptions (Mann’s MBH98 and Hansen’s GISTEMP) both coming under exceptional circumstances more or less over their dead bodies and only after enormous adverse publicity. Appeals to journal policy, scientific mores, journals, funding agencies have usually been unsuccessful.

But the mention of the R philosophy worked. There’s a moral here: the Community needs to learn from the R Community.

UPDATE: The concession was only partial. I wrote to De’ath asking him for the code for the actual article, rather than the code for his rebuttal to Ridd. He replied:

Steve

OK — one final go.

(1) You’ve got the code for the purely temporal analysis as part of the code I previoulsy sent you.
It reads:
####################################
load(“Calcification.RData”)
library(mgcv)
z< -gamm(cal~s(year,k=10,fx=T),random=list(rf=~1,id=~1),data=corals.2000)
summary(z$gam)
plot(z$gam,se=T,shift=mean(corals.2000$cal),sca=0,ylim=c(-0.27,0.09),xlab=”Year”, ylab=”Calcification”)
####################################

That together, with that previously sent code, address ALL Ridd et al issues with the analysis.

The lmer analyses (of which ALL the outputs are in the SOM, and hence ALL the models are stated as part of that ouput) are used (as stated in the SOM) to identify the models in terms of (a) the random effects structure and (b) which of the spatial & temperature covariates are included for the second analysis.

To quote the SOM:
“In this section the details of the analysis for data set 1900 – 2005 that included SST and spatial predictors are presented in detail. Similar models and fitting procedures were followed for the purely temporal analyses of the data 1900 – 2005 (all colonies) and the data 1572 – 2001 (10 colonies), but details are omitted as the procedures were simpler since no selection of predictors was required.”

Couple that with (also from the SOM):
“The final models for calcification, extension and density were fitted using the R package mgcv and partial effects plots were used to illustrate the effects of year and SST on calcification, extension and density. For calcification, extension and density, the temporal trends were selected with ~9 df.”
Then you have the answer!

** You’ve got the code for the temporal analysis **

I have had several discussion with R/statistically trained people and they seem to get it fine.

This was actually quite helpful as it shows the actual equation that they used. I don’t know why he wouldn’t send the corresponding equations for other parts of the article. AFAIK, the reason for not doing so is simply to make it more difficult to see what he did. With some time and effort, I can probably sort it out, but with the code, I could do it at lightning speed. The purpose of providing code in econometrics – as nicely articulated by Bruce McCullough – is to improve the efficiency of verification studies. The more trouble that you put people to, the less likely it is that anyone will verify it. How hard would it have been for De’ath to send the code out – he’s already spent more time consulting about not doing it.

On a scale of 1 to Lonnie Thompson, this is not a particularly bad case. The online data doesn’t tie in to the article, but the authors provided the data to Peter Ridd on request. The SI is pretty decent as these things go. After a first refusal, they grudgingly provided a squib of code, but only grudgingly and incompletely. Annoying, but nothing like Lonnie Thompson.

73 Comments

I’m sorry, but the philosophy of tool production does not carry over to tool use. There is absolutely no duty to turn over how a tool is used or extended, legal or ethical, that stems from the tool itself. R is certainly not unique in being an open source language with a large set of public libraries.

There is absolutely no duty to turn over how a tool is used or extended, legal or ethical, that stems from the tool itself.

I think Steve was arguing on philosophical grounds. Steve was using the contradiction of using an open-source community tool to create something for the scientific community that will not be shared. It is akin to jumping into the OS community to create an application everyone wants that works perfectly in Linux, but the code isn’t shared. While not technically wrong, it contradicts and undermines the very social structure which enables one’s own work to be successful.

These points are *argued* over in the open-source community, which is a relatively new community. The scientific community is supposed to operate in an open manner by-default, and it’s been that way for centuries. This modern (and flawed) concept of intellectual property has done more damage to our scientific progress than most people realize.

Re: Soronel Haetir (#1), Steve’s argument has to do with scientific verification. A conscientious research effort carefully documents hypotheses, methods and results. If the results are questioned – cannot be replicated – then the researcher has the information available to show just exactly how the results were obtained. There is no scientific content in an assertion such as, “my results are correct, but how I got them is my secret.” If those results are questioned, and the questioner shows that a reasonable approach, with reasonable assumptions and the same data does not yield the same results, that openly documented analysis has more scientific weight than a thousand studies whose methods are closely held and protected as “IP.”

You want to also recall that the opensource movement takes it’s inspiration from the Scientific Method, not leftist philsophy as is often assumed by the illerati. That ability to test and verify is why R and other OS products work so very, very well.

I don’t know why he wouldn’t send the corresponding equations for other parts of the article.

I am not going to assign motives to this kind of behavior here, but I think we can all develop hypotheses to explain this phenomenon.

It’s very hard to rigorously test hypotheses in social science. I think the best test for my (unstated) hypothesis would be to compare how well central findings of published papers verify from those that freely shared code and data, and those that did not.

That’s only relevant if you intend to create a derivative work of R itself – do that, and you’d need to make all of your work in extending the tool open source as well. Using R as-is as a tool to analyze data carries no obligation (legal or otherwise) that your code and data be made publicly available. I think that’s a good idea when doing non-commercial scientific research, but the GNU GPL doesn’t really apply here.

I bang my head against the wall over stuff like this. Is this arguable?: After research and publication, the next step in the (true) scientific process is replication. If we can’t replicate, no point in even discussing the findings. In fact, one could argue that, ideally, replication would be/should be the first thing done in “peer review” (If it can’t be replicated, why publish it?). Of course, this is not practicable. Therefore, every author needs to provide EVERYTHING necessary for replication at the time of publication (or before). Otherwise, it is NOT SCIENCE. To me–and I hope I am channeling Karl Popper here–, this “intellectual property” argument is garbage. It can’t legitimately exist in science. Can you see Richard Feynman ever claiming “intellectual property”?

I am perfectly happy saying an ethical duty to disclose stems from the process of science. That proposition is not the theme of this post. I see this post claiming that such a duty flows from the choice of tools, and that is something I just don’t see having a valid basis.

In terms of formal obligation, I think that there are weightier obligations e.g. journal policies, US federal policies – which unfortunately are not typically enforced. What amused me here was De’ath’s line indicating that he set some store upon R open source philosophy (though this proved not to be his most important consideration):

I see this post claiming that such a duty flows from the choice of tools, and that is something I just don’t see having a valid basis.

You should study up on the GPL (GNU public license) before making such claims. I am not qualified to render expert legal opinion but I do use GPL software at times and this certainly would be debated as a gray legal area.

Soronel is correct – I deal with GNU GPL routinely, and it only covers the use and distribution of the tool itself. Not any user content or data massaged by that tool, and not even other tools which use the GPL tool (as long as they do so from an “arm’s length” distance, and don’t integrate to it at the code or library level). In other words, if someone wanted to take the R code and extend it into an “R++” tool, they would be bound by the GPL – R++ would be required to be open source. And if someone wanted to link to any libraries within R and ship that as part of their own product, that product would be covered by GPL too, and they’d have to make full source available. But using R to analyze climate data? Nope. Not covered. This has been true from the very first version of GPL up to the latest one (GPLv3).

I’ll certainly agree that publicly funded scientific research should be operated with an open-source philosophy, but that has nothing to do with whether GPL applies to the tools used in that research.

as long as they do so from an “arm’s length” distance, and don’t integrate to it at the code or library level

That’s where the “derivative work” issues come to play. In particular, if you write code, include calls to a library and statically link that library, then distribute the executable, the recipient has a right to request all of your source. This came up with the use of FFTW, btw, and the leading belief (among those I’ve discussed this with), is that the “solution” is to use dynamically linked libraries.

I’ll certainly agree that publicly funded scientific research should be operated with an open-source philosophy

Which is really Steve’s point, and he seems happy, so I’m going with the flow and being happy, too! 🙂

If someone describes a R program to sufficient detail that it can be reproduced and invites others to reproduce it, is this a distribution of the program?

Further if the description is not of a generic software program but iconsists of specific references to the operation and interaction of R features, is this a distribution of the program. The R features are dependent on GNU and so the description itself is dependent on GNU.

The point is to describe a program, that itself is covered by GNU, to such an extent that somebody with ordinary skill can create it or something functionally equivalent to it. I can see that a strong argument can be made that this practice is covered by the GNU licence.

Re: TAG (#22), In patent law, someone may be found to have infringed a patent if he induces others to practice the patented technology. He need not practice the technology himself but merely has to supply the tools so that others can practice it.

I see this as directly analogous to someone who merely describes a GNU protected program with the intention that others could copy it.

So if someone creates a program that relies on calls to a GNU library but does not include that in the distribution but supplies instructions so that others can create the link, is his distribution covered by the GNU licence?

Not that it matters, buuut… I’m talking to my GPL guru buddy and he seems to think that as long as there was no delivered binary, i.e., an executable, the source is likely not legally required to be distributed (upon request), either. The R FAQ is not really explicit on this, btw, but I tend to agree with him (and thus, Soronel).

Professor Eddington’s analysis of photographs of a solar eclipse confirmed the correctness of Einstein’s equations. When asked by colleagues at the November 19th Royal Society Meeting to produce the data to support his claims regarding Einstein’s theory Eddington replied,

“Giving them the equations and source code would be giving in to the intimidation tactics that these people are engaged in,” he added ” We have 1000’s of hours of time invested in the work. Why should I make the data available to you, when your aim is to try and find something wrong with it?”

Professor Eddington’s analysis of photographs of a solar eclipse confirmed the correctness of Einstein’s equations. When asked by colleagues at the November 19th Royal Society Meeting to produce the data to support his claims regarding Einstein’s theory Eddington replied,
“Giving them the equations and source code would be giving in to the intimidation tactics that these people are engaged in,” he added ” We have 1000’s of hours of time invested in the work. Why should I make the data available to you, when your aim is to try and find something wrong with it?”

Professor Eddington then demanded that the Industrial Revolution should be cancelled in order to maintain the stars in their proper positions in the sky, and that he should be awarded a dukedom, and complete authority to implement the New Ludd Act by Royal Prerogative.

Following on from that idea, its instructive that Einstein handed over all of his equations and calculations on the precession of Mercury for peer review. There was no question of Einstein claiming that his equations and methodology were his own intellectual property. From reading Einstein and biographies written about him, the notion of private ownership of scientific ideas would have been anathema.

There are many other examples of “open source” scientific discoveries in the literature. It didn’t stop the authors of new ideas being attacked, of course, but it did squelch any idea that those discoveries were papering over large holes in the methodology or the data.

While open books and methodology are not absolute guarantors of probity, consistently in science a refusal to allow full replication and inspection of data, especially when a spectacular result is claimed, is not infrequently correlated with some form of misconduct.

I agree with Shallow Climate. Mann’s arguments are lame at best. Isuues of intellectual property are only relevant when concerned with software that has commercial applications. Software used for scientific research should be made available to all, so that the results may be replicated. Failure to do so can only cast doubt on the validity of the research.

Steve, the reason for obstructionism is so that if you can’t replicate their work exactly, they can say either (a) you are doing it wrong, (b) you are incompetent, or (c) you are intentionally presenting false results, or (d) a combination of a b or c.

If any scientist wants results to be taken seriously, (s)he should bend over backwards to make available source code and data. I don’t trust results of anyone that doesn’t. In fact, the very act of withholding same is a good reason to doubt integrity of both the scientist and the results.

This question of IP rights boils down to contracts. If Mann was doing this work under a federal or state contract, then the code is 100% in the public domain, and beyond this there is a clause in every federal contract called “government purpose license”. This is a government code word (look it up) that states that anything developed under a federal contract not only is in the public domain, but can be taken by the government and given to other contractors.

If Mann is doing this work under some private grant other terms may apply. If Mann is doing this as a professor, then the university or educational institution for which he works owns the data and the code.

Above and beyond all of this, as a scientist who is producing work with possible world wide civilization changing ramifications, he has a moral obligation to go above and beyond any call of proprietary nature to release any and all of his research so that the most basic code of science, that of replication of results, can be undertaken and audited. If he is not doing this, he is not doing science in any meaningful sense.

This whole concept of personal retention of IP in the public sector is beyond me.

Try spending 6 months in the private sector producing a report then tell your employers that they can’t see it because you have invested 6 months of your time producing it and that it is your own personal property! (not to mention that they might find something wrong with it)

When I am named as an inventor on a patent my rights are acquired for $1. Yes, I do receive additional payments but they are to encourage further inventions and not a legal requirement.

There is an interesting point to make relative the Steig and mann. Both use ( and modify?) the Regem code produce by
Schneider ( was that his name??) anyways, he just made his code publicly available with no license. It might be interesting to create a version of GPL targeted at applications in the sciences, whereby if you modify the code and use it in a published study that you agree to make the modification available to others under the same terms.

I’ll also take this opportunity to promote and old friend ( peter suber) and his thoughts on Open Science.

I must disagree with those posters who have said that failure to disclose methods (code) and data is “not science”. It most certainly is – or at least can be. I think we need to make a large distinction between “what is science” and “what is credible”. You can call it science if you like (or not), but it is only credible science if disclosure of methods (inc. code) and data is complete.

It seems to me that this is the path Steve took (intentionally or not) with the hockey stick, and is the path most likely to lead to a mutually acceptable resolution.

Re: John A (#32), and others: What of privately funded research with commercial implications? Such research may have limited disclosure, but still result, after some engineering work, in a novel and useful product or service. There is also the situation where you cannot publish your research due to restrictions from the funder for a certain period (maybe forever). Is it only science after it’s published? Is secret, military funded science not science? I would argue that it is science, but has limited credibility – such limitations may not be a concern for the organisation providing the funding (and therefore, the owners of that research).

Having said that, if the research is published in a journal such as Science or Nature, and is to be used for public policy decisions, then full disclosure should be mandatory. Indeed, journal policies usually specify such disclosure as a condition of publication. Unfortunately, it seems that such policies are not always enforced as we have seen all to often with climate science. That such breaches of policy, once discovered and reported to the publishing journal have not resulted in either immediate disclosure or withdrawl of any and all papers that rely on non-disclosed methods and data is what is, in my opinion, truely unforgivable.

What of privately funded research with commercial implications? Such research may have limited disclosure, but still result, after some engineering work, in a novel and useful product or service. There is also the situation where you cannot publish your research due to restrictions from the funder for a certain period (maybe forever). Is it only science after it’s published?

The answer to that is that privately funded research with commercial implications has to answer to the same rigor as science elsewhere. The obvious example would be pharmaceuticals where scientific claims as to efficacy, toxicity of the drug treatment have to be objectively measured and the molecule itself must be patented. As Michael Crichton related, the requirements for pharmaceutical trials are far higher than anything encountered in climate science where basic data and methodological procedures to ensure that claims made can be independently verified are regularly flouted.

“Scientific” claims without verifiable evidence and falsifiable predictions are not credible as science.

I note that Roger Pielke Sr has weighed on the topic, arguing that climate models predicting future atmosphere twenty or more years into the future without any falsifiable criteria in shorter timespans are shortcircuiting the scientific method.

The scientific method involves developing a hypothesis and then seeking to refute it. If all attempts to discredit the hypothesis fails, we start to accept the proposed theory as being an accurate description of how the real world works.

A useful summary of the scientific method is given on the website sciencebuddies.org where they list six steps

Unfortunately, in recent years papers have been published in the peer reviewed literature that fail to follow these proper steps of scientific investigation. These papers are short circuiting the scientific method.

There’s no good “one size fits all” answer as to what constitutes science in this regard. There are scientists out there who regard String Theory as unscientific because it has so far failed to make one unambiguous, unique testable prediction (plenty of people disagree on that point and there’s a healthy bunfight about it in the scientific community)

Its perhaps worth pointing out that large parts of what we call scientific theories began as fringe hypotheses: heliocentric theory, evolution, relativity, continental drift etc.

But in so far as scientific claims are meant to be testable, which means that replication is a key function of testing, then claims which cannot be tested or reproduced are not science.

This situation obtains for every single product and service the use of which could adversely impact the health and safety of the public. Look at a list of the regulatory agencies setup by your local National Leadership Organizations. All of them will require full disclosure of every aspect of the product or service.

The same thing will also obtain for isolated individuals and groups of individuals; flight control systems for military aircraft, for example. (There is a potential for military aircraft to adversely impact non-military people through accidents.) The list is for all practical purposes unlimited.

So far, it seems that the climate change community is the sole exception everywhere on the planet. Unintended consequences will occur; it’s only a matter of time.

The answer to that is that privately funded research with commercial implications has to answer to the same rigor as science elsewhere.

In that particular case, there are implications for public health – clearly, in situations where the general public is affected by the research in a significant way, we need a level of disclosure and replication of research that goes well beyond the simple peer review used by the major journals. IMO, this obviously applies to climate change research as it applies to changes in public policy. Just as clearly, not all “science” has such an impact, and it is difficult to know in advance which results are “significant” in this respect and which are not. For instance, not many people would have suspected that climate research would have had such an impact if they had examined the field in the 1960’s. I have no idea how we can “fix” this dilemma, except to say that we need to apply the same principles to research cited in support of public policy decisions as we do to your example of medical research – and at least in terms of climate, we have not done so to date. Perhaps one of the results of the current climate “debate” (such as it is) will be such a system.

There are peer-reviewed journals, just as secret as the work that is discussed in them, to handle secret scientific, engineering, and technology work. And secret reports that are peer-reviewed and replicated much more so than so-called peer-reviewed papers in the open literature.

Certainly – the military, who generally hold such research, is much more “advanced” in this respect than universities etc. As well they must be. Alas, govt. other than the military do not appear to hold such high standards, which is somewhat regretable. As above, I suspect this may change in the future, which can only be a good thing.

For instance, not many people would have suspected that climate research would have had such an impact if they had examined the field in the 1960’s. I have no idea how we can “fix” this dilemma, except to say that we need to apply the same principles to research cited in support of public policy decisions as we do to your example of medical research – and at least in terms of climate, we have not done so to date. Perhaps one of the results of the current climate “debate” (such as it is) will be such a system.

You’ll not find much disagreement in these parts. I think that McIntyre and McKitrick (2005) said much the same thing.

Re: Neil Fisher (#47),
There are peer-reviewed journals, just as secret as the work that is discussed in them, to handle secret scientific, engineering, and technology work. And secret reports that are peer-reviewed and replicated much more so than so-called peer-reviewed papers in the open literature.

The Manhattan Project was most certainly “science” in the sense that the basic physic had to be understood and verified by at least a group of people, secret or otherwise, inorder to build the first bomb and replicate them for military use. But… the point of climate “science” is not to have a small “secret” group understand it and leave the rest of the world in the dark. That in the broadest sense is against the conventional view of science. Hypothesis -> Collect Facts -> Theory -> Test Theory -> Repeat

In case there is any confusion, there are 2 scientists named Dr Peter Doherty in Australia (at least). One advances molecular biology, is a Nobel Laureate and has written a book named “The Beginner’s Guide to Winning the Nobel Prize”. In short interactions with him, he has been most cooperative.

Another Dr Peter Doherty is from the Australian Institute of Marine Science and leads the ‘Maintaining Ecosystem Quality’ program. He is Task Supervisor for the Seabed Biodiversity Project. I have not met him though the AIM works closely with my alma mater.

The speed of progress is greatly enhanced by virtue of the fact the practitioners of Science publish not only results, but methodology, and techniques. In programmatic terms, this is equivalent to both the binary and the source code.

Good scientists leave open the possibility that a new discovery will advance understanding. There is no common concept of being “proved wrong” except in blatant, conceded cases where the error is obvious. There is a more usual concept that “Unless or until something better comes along, our best understanding of this subject is that …..”

The thrust of this thread is that science will advance faster and better when people eagerly expose their work to the scrutiny of others and participate in the “best understanding” process. If a particular computer program assists in the coordination, then that is excellent.

More than one list voted the printing process as the most significant advance of the last millennium. Sharing its science and engineering with the ROW created enlightenment.

I wonder what it would take to create an open source science model that would allow disparate scientists to contribute? Obviously, they are out there. Perhaps personal PCs could be networked together to get some processing time. I am a programmer, but currently don’t have a clue how to do that. It can be done, however. It would not equal a super computer, but more time would be available.

I’ve always taken the position that software written and paid for by federal grants is public domain, and software that is privately developed belongs to the author.

I have some critical code that I’ve written on my own over the years for data acquisition that I don’t freely share with others, but other data acquisition systems exist so I don’t exactly feel that is hindering other people from reproducing my results.

As another example of proprietary codes, I have a colleague who sells an acoustic propagation code that he uses for publication purposes at times. However, all of the basic algorithms are published as are simpler versions of the code (one that doesn’t have all the cool GUI-based aspects). This is in my opinion sufficient disclosure.

You don’t have to release your proprietary code in all cases, just a simpler (e.g. MATLAB script) that allows other people to replicate your work.

The problem I have with Mann is how seldomly his description of his algorithms follows how he really implements them, I know of no examples where his code was privately developed, nor any commercial applications to them.

Perhaps he could give a reasonable justification for non-disclosure of his code beyond “it’s mine dammit. You can’t have it!”. I can’t think of one.

This discussion is interesting. I’ve worked in software development for several years, and have recently gotten back into studying computer science. If I had stumbled upon this discussion a few years ago I probably would be talking about how the GNU license works and the ethics of OSS, like we’ve seen above. I agree with the more science-oriented rationales.

What I’ve realized more recently is that source code in effect represents a theory of a computational process–a model. You already know this, but it’s a new idea to me. This theory is then tested (the model is put in motion) by executing the code on a computer. This just shows that it runs in a computationally sound way. It says nothing about what the model actually represents in terms of the real world. Since the model in motion has mathematical properties (it can represent anything, real or unreal, but it does follow certain rules and constraints), in a scientific context it is in effect an instrument that needs to be tested against real phenomena to see if it is helpful in making predictions, and is worthy of being used for further research. Seeing a model run is at best an anecdote. Indeed it should be tested against historical data to see how accurate it models past climates. However in order for it to be used for prediction the source code needs to be examined to see how much of the concepts that make up the model, and the parameters, are based on well established scientific principles and known data, and how much is experimental and speculative. This way parameters can be established for how much weight should be given to the model’s results. No scientist would want to use an instrument without knowing its limitations and potential for misleading the observer. I don’t say this to suggest that models mislead by their nature. Even scientific instruments of other types (take telescopes for example) can mislead those who use them if observers are not made aware of the instrument’s limitations and tolerances.

I’m a chemical engineer; even my back-of-the-envelope calculations for pipe dimensions and energy balances get checked and signed off by other engineers. Full disclosure of assumptions, calculation methods etc. (Okay so that’s not the same as peer reviewed, because I’m getting paid, as are my colleagues checking my work)

Knowing the little i do about forecasting (okay, I’ll admit straight up I’m VERY opinionated, and a little bit of knowledge is a VERY bad thing to have) in the realm of process control (where we might be controlling 1-5 variables on 1 piece of equipment – which is a lot simpler, better understood, and instantly verified by realtime results vs. the 100’s of variables that must go into a climate model), I cannot believe their accuracy, simply because forward predicting models will always overemphasise any and all errors (in my experience). Hence why there are only a handful of cases where they are useful, or practical in my profession. Hence why more complex control systems are the more reliable (and $) solution to many control problems. I’d like to see how a boring (and very simple) control system taken from my field would go at “forecasting” the global mean temperature.

Basically, I’ve decided, that unless I’m looking at the raw data, something is up, OR they better have a damn good reason for screwing with the data.

It bothers me greatly to see individuals passing judgement on all climate scientists and all climate models based on a cursory examination. I wouldn’t presume to make universal claims on an entire field, and all the scientists within it, based on perusing the occasional paper; what makes climate science less worthy of respect than other disciplines?
Steve: Blog policy here asks people not to “pile on” and make generalizations of the type that you object to here. I’ve snipped a couple of posts that breached this policy and will consider others if you bring them to my attention. Having said that, the “silence of the lambs”, the acquiescence of the broader Community to the conduct of certain people, is something that can be held against the “Community”.

I agree with Gary Strand’s point here. Blog policy asks people not to “pile on” by extrapolating from the conduct of some to the conduct of all.

Readers should also bear in mind that such “piling on” comments are counter-productive to the point that they wish to make.

HAving said that, as I observed in an online comment, it is my view that the “silence of the lambs” is something that can be held against a larger Community. As I observed in a post by this title, there was no way that any dendro was going to criticize Mann et al 2008 for – for example, using dendro proxies upside-down. A number of them were aware of problems and kept silent.

When Mann attracted national attention by refusing to “disclose his algorithm”, no climate scientist or professional association stood up and said “C’mon, Mike, this isn’t doing any of us any good. Just release your algorithm and drive on.” The silence of the lambs. But when Mann, who had voluntarily appeared in 2003 before a congressional subcommittee to admonish them, was asked to produce the code, all hell broke lose. Every climate science association you can think of stood up and objected – without any sense of the irony that they had previously stood by silently. NAS offered to investigate the contentious “verification r2” question for the House subcommittee and then wrote terms of reference for their panel that excluded this issue and ultimately did not investigate why this statistic was not reported in MBH98.

Thank you for your clarification on Mann’s willingness to disclose more code than others. It is concerning that there remains the others have taken the route of precedence of non-disclosure. It seems that for those individuals, their reasoning may be that the results of their work need no validation as their conclusions align well with the accepted knowledge of the day. Their borrowing precept upon precept and repeating the same errors of methodology, absent a means of independent verification, serves only to reinforce a form of knowledge that will fail partially or wholly when the voice of wisdom – the momentarily silent lambs – demand that the knowledge be justified.

Fortunately, not all lambs are silent as you have demonstrated. Thank you for your contributions to the process of scientific inquiry. I hope the other lambs will see they need not be silent and follow by your example.

Also, thanks for the snip of my last sentence. In retrospect, I would snip it myself if I could. Disclosure, reproducibility, and verification are indeed a passionate issue for some (like me), sometimes leading to careless generalized statements – your point, as well as Gary’s is well received and appreciated.

Mann’s whole intellectual property (IP) argument is total nonsense in the context of published abstracts and manuscripts which, by the nature of publication, invites verification through reproducibility. True, Mann’s team did all of the grunt work but that is what research is all about – being the first to publish a meaningful conclusions then promoting independent confirmation of your conclusion through full disclosure of your methods and data.

Given that Mann’s stated intellectual property is computer code, it would be protected as an IP under U.S. copyright law. Searching the U.S. Copyright Office records, I can’t find where Mann has copyrighted any computer code nor has even attempted to file for copyright protection. That doesn’t mean his code isn’t a personal intellectual work which he can try to protect through non-disclosure. It does mean that Mann has made no reasonable attempt to legitimize any of his claims of his code being materially an intellectual property. If Mann truly saw his code as needing protection from some imagined infringement, he would have made half an effort to protect it like everyone else does.

His reason to refuse disclosure of his code is bogus academically, legally, and professionally. As a many time published medical researcher, I can’t understand how people like Mann feel they have ascended to incorruptibility and have the authority to operate outside of the foundational principals of scientific research with impunity.
-snip

Steve: At this point, through public pressure, Mann has actually disclosed more of his code than others and, in Mann et al 2008 and several other recent articles has responded to this criticism by providing copious poorply documented and poorly written code – which people have been working through. After the RegEM work on Steig where knowledge of this method has vastly improved, we’ll go back and revisit Mann et al 2008 sometime. I’ve snipped one piling on sentence.

Given that Mann’s stated intellectual property is computer code, it would be protected as an IP under U.S. copyright law. Searching the U.S. Copyright Office records, I can’t find where Mann has copyrighted any computer code nor has even attempted to file for copyright protection.

While I agree with the general point you have made regarding Mann’s IP argument, this point is a bit incorrect. At least, Berne Convention signatory countries no longer require to registration of literary/artistic works to obtain copyright protection. It is automatic the moment you create the work (assuming it is original). Copyright protection via registration simply makes it an easier task to defend should any litigation occur.

Perhaps you could point us to the climate scientist who, in your view, are meeting normally expected standards, and for that matter, the archiving requirements of the journals in which they publish their ‘peer reviewed’ papers.

Then, having done that, you might want to indicate to us which, in your view, of the climate scientists are not complying with expected standards. Your wording seems to suggest that you agree that at least some climate scientists are not doing the right thing.

Yes, I agree with your comment. I’m assuming it was partly (or in full?) directed at my comment. Whenever I re-read what I tend to send in emails to people or posts on the internet I always tend to cringe at the thought of how my tone and opinion comes across. I take back the over-arching “All climate scientists are bad” tone, and will stick with my final point about raw data being more powerful than trends.

I should add, as noted in the wiki article, the US joined the Berne Convention on March 1, 1989. The US apparently does not allow damages and/or attorney’s fees awards unless a work is registered prior to infringement, however, so there is some benefit to registration contrary to my last statement in #67.

Keep in mind that the Berne Convention usually only comes into play in a case involving international infringement. Typically, it isn’t cited in cases where the infringement occurs inside the country of the work’s origin simply because the Convention recognizes that each country has their own IP laws that apply internally. Signatories agree that there are some common rules of law that apply universally but in the case of the U.S. those common rules of law have long existed.

Owning a number of registered IP’s, I find the legal discussion interesting. However, what I find more interesting and a bit bizzare is that arguments of IP ownership would even arise in the context of published abstracts and manuscripts. If the author of the abstract feels they own an IP that will be infringed by disclosure to other researchers then the scope of information based on the IP should be excluded from publication. If excluding it means they have no abstract to publish then so be it – they have nothing to publish. By publishing the abstract containing the information, they set precedence through the age old tenents of scientific research that all of the methodologies are available to disclosure, including the concerned IP.

This business of scientific publishing and disclosure is getting kind of funny.

I got a review back on a paper questioning what I meant when I claimed that a certain result was obtained by performing a symbolic integration using Maple. You sometimes have to read to tea leaves to find out what a reviewer is getting at with such a question. But mindful that a paper is only a partial disclosure of what was done to get a result owing to page limits, I answered that Maple is a commercial software package for symbolic solution of equations (what do I know, maybe the reviewer has heard of Mathematica but not Maple), and in the rebuttal to the review, I attached the Maple “formula sheet”, essentially the source code.

The Associate Editor responds something to the effect that people know what Maple is, (what do I know, maybe the reviewer has heard of Mathematica but not Maple), and that my scientific integrity was not being questioned.

I mean, what is the problem? If the reviewer questions the intermediate steps used for a formula in a paper, and the author comes up with the one-page input to Maple along with a comment section explaining the meaning of the variables, what is so extraordinary about any of this? What is so extraordinary about providing “back story” to reviewers given the space limitations of many journals?