Reproducibility

If your research cannot be reproduced, you might end up on 60 Minutes. Two days ago the new show ran a story about irreproducible research at Duke. You can find the video clip here.

I believe the 60 Minutes piece was somewhat misleading. It focused on data manipulation and implied that the controversial results followed from the manipulated data. As Keith Baggerly explains here, that is not the case. The conclusions do not follow from the (erroneous) data. The analysis itself was irreproducible. That discovery started the whole saga.

Update: Here’s some footage that 60 Minutes recorded but did not include on Sunday. “The systems we have in academia, especially with something this complicated, shield sloppy science and fraud.”

Emacs org-mode lets you manage blocks of source code inside a text file. You can execute these blocks and have the output display in your text file. Or you could export the file, say to HTML or PDF, and show the code and/or the results of executing the code.

On the #+begin_src line, specify the programming language. Here I’ll demonstrate Python and R, but org-mode currently supports C++, Java, Perl, etc. for a total of 35 languages.

Suppose we want to compute √42 using R.

#+begin_src R
sqrt(42)
#+end_src

If we put the cursor somewhere in the code block and type C-c C-c, org-mode will add these lines:

#+results:
: 6.48074069840786

Now suppose we do the same with Python:

#+begin_src python
from math import sqrt
sqrt(42)
#+end_src

This time we get disappointing results:

#+results:
: None

What happened? The org-mode manual explains:

… code should be written as if it were the body of such a function. In particular, note that Python does not automatically return a value from a function unless a return statement is present, and so a ‘return’ statement will usually be required in Python.

If we change sqrt(42) to return sqrt(42) then we get the same result that we got when using R.

By default, evaluating a block of code returns a single result. If you want to see the output as if you were interactively using Python from the REPL, you can add :results output :session following the language name.

Without the :session tag, the second line would not appear because there was no print statement.

I had to do a couple things before I could get the examples above to work. First, I had to upgrade org-mode. The version of org-mode that shipped with Emacs 23.3 was quite out of date. Second, the only language you can run by default is Emacs Lisp. You have to turn on support for other languages in your .emacs file. Here’s the code to turn on support for Python and R.

The Economist posted an article online this weekend about the scandal over irreproducible cancer research by Anil Potti. My colleagues Keith Baggerly and Kevin Coombes have been crying foul about this since 2007. I first blogged about it in January 2008.

The story started getting wide-spread attention last summer when the Cancer Letter reported that Dr. Potti had lied on grant applications. Since then there have been articles in the popular press, and people are staring to file lawsuits.

Apparently the tipping point in the story was finding a fib on Potti’s resume. According to The Economist

He falsely claimed to have been a Rhodes Scholar in Australia (a curious claim in any case, since Rhodes scholars only attend Oxford University).

So what finally got people to pay attention was not accusations of incompetent or fraudulent science, but résumé padding. As Keith Baggerly commented,

I find it ironic that we have been yelling for three years about the science, which has the potential to be very damaging to patients, but that was not what has started things rolling.

A recent article in The New Yorker gives numerous examples of scientific results fading over time. Effects that were large when first measured become smaller in subsequent studies. Firmly established facts become doubtful. It’s as if scientific laws are being gradually repealed. This phenomena is known as “the decline effect.” The full title of the article is The decline effect and the scientific method.

The article brings together many topics that have been discussed here: regression to the mean, publication bias, scientific fashion, etc. Here’s a little sample.

“… when I submitted these null results I had difficulty getting them published. The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.” … After a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.

This excerpt happens to be talking about “fluctuating asymmetry,” the idea that animals prefer more symmetric mates because symmetry is a proxy for good genes. (I edited out references to fluctuating asymmetry from the quote to emphasize that the remarks could equally apply to any number of topics. ) Fluctuating asymmetry was initially confirmed by numerous studies, but then the tide shifted and more studies failed to find the effect.

When such a shift happens, it would be reassuring to believe that the initial studies were simply wrong and that the new studies are right. But both the positive and negative results confirmed the prevailing view at the time they were published. There’s no reason to believe the latter studies are necessarily more reliable.

When I was in college, a friend of mine told me he liked to take his code out for a walk every now and then. By that he meant recompiling and running all of his programs, say once a week. I asked him why he would want to do that. If a program compiled and ran the last time you touched it, why shouldn’t it compile and run now? He simply said I might be surprised.

Even when your source code isn’t changing, the environment around it is changing. That’s why your code can break without anyone touching it. Peter Deutsch made this observation in the context of networks in his Eight Fallacies of Distributed Computing.

The network is reliable.

Latency is zero.

Bandwidth is infinite.

The network is secure.

Topology doesn’t change.

There is one administrator.

Transport cost is zero.

The network is homogeneous.

Kevin Kelly made the same observation in the context of data storage. Because data formats change and physical media decay, you’ve got to keep copying your data to save it. He coined the term movage to describe the active process of preserving data.

The only way to archive digital information is to keep it moving. I call this movage instead of storage. Proper movage means transferring the material to current platforms on a regular basis … anything you want moved to the future has to be given attention to keep it moving forward.

This morning I had problems running LaTeX (with Beamer) on an old presentation and that made me think a post I wrote for the for Reproducible Ideas blog that I’ve since shut down. In the spirit of Kevin Kelly’s movage, I’ve kept the ideas in my old post alive by updating them here.

The more active a research area is, the less reliable its results are.

John Ioannidis suggested popular areas of research publish a greater proportion of false results in his paper Why most published research findings are false. Of course popular areas produce more results, and so they will naturally produce more false results. But Ioannidis is saying that they also produce a greater proportion of false results.

First, in highly competitive fields there might be stronger incentives to “manufacture” positive results by, for example, modifying data or statistical tests until formal statistical significance is obtained. This leads to inflated error rates for individual findings: actual error probabilities are larger than those given in the publications. … The second effect results from multiple independent testing of the same hypotheses by competing research groups. The more often a hypothesis is tested, the more likely a positive result is obtained and published even if the hypothesis is false.

In other words,

In a popular area there’s more temptation to fiddle with the data or analysis until you get what you expect.

The more people who test an idea, the more likely someone is going to find data in support of it by chance.

The authors produce evidence of the two effects above in the context of papers written about protein interactions in yeast. They conclude that “The second effect is about 10 times larger than the first one.”

Last week .NET Rocks mentioned a good idea in passing: start a screencast tool like Camtasia before you do a software install. Michael Learned, told the story of a client that asked him to take screen shots of every step in the installation of Microsoft’s Team Foundation Server. Carl Franklin commented “What a great idea to throw Camtasia on there and record the whole process.”

It would be better if the installation process were scripted and not just recorded, but sometimes that’s not practical. Sometimes clicking a few buttons is absolutely necessary or at least far easier than writing a script. And even if you think your entire process is automated with a script, a screencast might be a good idea. It could record little steps you have to do in order to run your script, details that are easily forgotten.

Another way to use this idea would be to have one person do a practice install on a test server while recording the process. Then another person could document and script the process by studying the video. This would be helpful when the person who knows how to do the installation lacks either the verbal skills to explain the process or the scripting skills to automate it.

As part of this process, I’m winding down the blog that I started last July as part of the ReproducibleResearch.org site. I plan to keep the links to my old posts valid, but I do not know whether the new site will have a new blog. I wrote about reproducible research on this blog before starting the ReproducibleResearch.org site, and I will go back to writing about reproducible research here. (See reproducibility in the tag cloud.)

I wanted to point out an article by Steve Eddins posted this morning: Reproducible research in signal processing. His article comments on the article by Patrick Vandewalle, Jelena Kovačević, and Martin Vetterli announced recently on ReproducibleResearch.org.

Readers interested in reproducible research may also want to take a look at the Science in the open blog.

I just posted an article on my other blog, Reproducible Ideas, called Musical chairs and reproducibility drills. The post is about rotating programmers, in classes and in professional software development. The post ends with some thoughts on having a build master and rotating that position.

I’d like to see this become a community site. Depending on how much interest the site stirs up, I may add a blog, a Wiki, etc. For now, if you’d like to contribute, send me articles or links and I’ll add them to the site. You can send email to “contribute” at the domain name.