Scientific reproducibility, for fun and profit

A new project will confirm scientists' results are reproducible—if they pay.

Reproducibility is a key part of science, even though almost nobody does the same experiment twice. A lab will generally repeat an experiment several times and look for results before they get published. But, once that paper is published, people tend to look for reproducibility in other ways, testing the consequences of a finding, extending it to new contexts or different populations. Almost nobody goes back and repeats something that's already been published, though.

But maybe they should. At least that's the thinking behind a new effort called the Reproducibility Initiative, a project hosted by the Science Exchange and supported by Nature, PLoS, and the Rockefeller University Press.

There are good reasons that scientists usually don't do straight-up repeats of published experiments. Funding agencies have little interest in paying for work that's redundant or derivative, and few journals are willing to run something that's essentially a do-over. Plus, as a researcher, it's simply hard to get excited about doing an experiment where you think you already know what the answer is going to be. With so little incentive for reproducing results, it's not surprising that most people only try to reproduce something if they think the original report was wrong.

How does the Reproducibility Initiative hope to get past this? They've got a partial solution. PLoS one has agreed to create a special reproducibility section, where they'll publish both the original finding, and any results that come out of attempts to reproduce it. That should allow researchers the possibility of getting a second paper out of a single set of results. If the original paper that's being reproduced was published in a Nature or Rockefeller Press publication, they'll link in to the report of reproduction. Data from the verification will be hosted on the Figshare site.

That still leaves a couple of big issues: who does the work, and how does it get paid for? This is where a bit of enlightened self-interest may be at play. The Initiative is hosted by the Science Exchange, which makes money by linking researchers in need of expertise to labs that have it. A researcher could advertise that they need a specific assay done—say, a challenging bit of mouse genotyping—and labs that are good at genotyping can submit bids to perform the work. When a bid is accepted, Science Exchange takes a cut of the price.

Science Exchange is interested in the Reproducibility Initiative because it's set up so that, when a lab wants to see its own work reproduced, it is supposed to find a contractor to do so via the company's service.

The missing piece? Someone willing to pay to see an experiment replicated as precisely as possible. The site promises that there will be announcements soon regarding groups that are willing to put up the money but, so far, there are no specifics.

If that can be sorted out, then there's no reason this wouldn't work. Researchers have an incentive—a second publication for minimal effort—and the people who actually do the experiments get paid for doing something they're presumably good at.

But is it really necessary? Here, the answer is a bit more complicated. In principle, it would be good to know what percentage of results can actually be reproduced. But my expectation is that they'd vary dramatically from field to field. A lot of behavioral studies are done on small populations of undergrads from a single university, and it's probably safe to assume there's a risk that undergrads in Beijing, Boston, and BYU could produce significantly different results. But that's probably a minimal risk in the case of something like structural biology.

Then, unless someone messes up data or an algorithm, it's hard for things to go wrong, since generating math is mostly a matter of well understood calculations. For that and similar fields, problems with reproducibility mostly focus on the code that performs these calculations, which could be restricted by a variety of licenses, which may or may not allow others to even look at the code involved.

Between these extremes, the value of direct reproduction is probably going to be hit or miss. A highly significant result will end up being tested in various ways, simply as a result of different labs following up on it. But some ideas that have been wrong have stuck around and influenced thinking for a while, and sorting those out quickly through reproduction could move science along faster than it would have on its own.

Whether it succeeds or not, the effort is a tacit admission that, with the huge volume of scientific publication and continuing problems with both honest mistakes and outright fraud, it's time to at least be considering ways with which we could provide a greater degree of confidence in scientific findings.

25 Reader Comments

"For that and similar fields, problems with reproducibility mostly focus on the code that performs these calculations, which could be restricted by a variety of licenses, which may or may not allow others to even look at the code involved."

Can you give me an example of a programming environment/language whose licensing terms prohibit the publishing of code written in that language?

Some common programming languages/environments used in science are: C/C++, Fortran (believe it or not), Matlab, R, SAS. As far as I know there are no restrictions on publishing source code written in those languages.

It would seem to me that direct payment from original researcher to reproduction house would result in pressure for positive findings. Don't want to disappoint the client and show that their hard-earned published result might not show a real effect afterall.

The payment structure needs to insulate the reproducers from the payers. Possibly instead of having the Exchange act merely as a clearinghouse/ad agency, they should take the funding from the original researchers and add it to an overall fund pool, and then *they* should pick the reproducers and pay them from the fund pool. The Exchange has little incentive towards positive findings, just accurate reproduction of the original experiment - breaking the link between funder with a particular interest in the result and the reproducer should provide more reliable results.

"For that and similar fields, problems with reproducibility mostly focus on the code that performs these calculations, which could be restricted by a variety of licenses, which may or may not allow others to even look at the code involved."

Can you give me an example of a programming environment/language whose licensing terms prohibit the publishing of code written in that language?

Some common programming languages/environments used in science are: C/C++, Fortran (believe it or not), Matlab, R, SAS. As far as I know there are no restrictions on publishing source code written in those languages.

I don't think that was the point. The code (or part of it) distribution can be restricted independently of the language.

One important question still isn't answered: who would be willing to pay for this?

Experiments, especially in biological sciences, can be EXTREMELY expensive. There's just no incentive for a lab to pay for very expensive experiments to be done twice as many times as necessary for good statistics and a publication. Even labs with tons of money aren't going to want to do this because they'd rather use the money they have to hire new students and post-docs and to buy new equipment.

I recognize the value in verifying reproducibility -- if we could determine the % of published results that aren't reproducible I think it would be a scandal -- but the cost makes it seriously hard to do.

One important question still isn't answered: who would be willing to pay for this?

Experiments, especially in biological sciences, can be EXTREMELY expensive. There's just no incentive for a lab to pay for very expensive experiments to be done twice as many times as necessary for good statistics and a publication. Even labs with tons of money aren't going to want to do this because they'd rather use the money they have to hire new students and post-docs and to buy new equipment.

I recognize the value in verifying reproducibility -- if we could determine the % of published results that aren't reproducible I think it would be a scandal -- but the cost makes it seriously hard to do.

This...first and foremost...times ten. The funding agencies will have to hand out money to do these kinds of things. Otherwise, investigators will focus their limited resources to finding something novel to attract more grant money.

A second publication from the original lab is of little value to the original lab and to others. We assume they can reproduce their data (leaving fraud out of the discussion for now, although this might be a nice way to discover these instances faster). What's really needed for biology is the limits of the model (how reproducible the result is in mice from a different colony, genetic background, etc.).

Also, reproducing a result often requires skill in multiple techniques. Finding a commercial entity to do genotyping is one thing. Finding a collection of people/labs with all the skills necessary to reproduce modern biology, and then coordinating them, is another. Often times, a lab publishes a discovery because they've also pioneered the techniques for doing it.

"For that and similar fields, problems with reproducibility mostly focus on the code that performs these calculations, which could be restricted by a variety of licenses, which may or may not allow others to even look at the code involved."

Can you give me an example of a programming environment/language whose licensing terms prohibit the publishing of code written in that language?

Some common programming languages/environments used in science are: C/C++, Fortran (believe it or not), Matlab, R, SAS. As far as I know there are no restrictions on publishing source code written in those languages.

I don't think that was the point. The code (or part of it) distribution can be restricted independently of the language.

Well, if that's the case then it's an institutional problem where universities or research institutions are being overly protective of their intellectual property. The fix is "easy" - reform the IP policy for published material, or place a caveat on published results which are not entirely proproducible due to lack of source code. "We've described the methods in as much detail as is possible without violating our University's policy on intellectual property, but still there may be bugs or idiosyncracies in the actual programs which may prevent exact reproduction of these results"

There are some moves toward this in computer science. The The joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE) has an "artifact award" section where, if the authors include references to materials that include their raw data and programs, they can win an award. Basically it's encouraging the reuse of data sets and programs. I suppose the intention is that someone with access to an interesting data set might use it for exploring other questions, or if someone has programs they can use the source code to apply the methods and techniques to new datasets and areas.

This of course isn't the same as the service presented in the article but shows a trend of trying to increase transparency in research.

This is a very much needed initiative. Some statistics show that more than half of the research publications in life sciences are actually plain wrong. Usually labs make a business of partially correcting and supplementing the data by follow-up publications. The whole process resembles tango at times.

I am less sure about the funding bit, i.e. it's unclear who's going to provide the resources for repeating experiments. One thing that I have imagined in the past is giving the research problem anonymously to other labs to replicate prior to publications, and providing the verifying lab with some partial credit.

Some mechanism has to be created anyway. The focus that life sciences put on novelty at the expense of reproducibily is ludicrous. Engineering disciplines are much better in that regard, and it isn't uncommon for papers to expend a significant effort into implementing a previously published algorithm for the sake of comparison.

While this might help, I'm skeptical. It's not solely or even primarily because funding in unavailable that most experiments will never be reproduced. Rather, I'd suggest that the lack of replication is because there's little prestige to be had in confirming something which is already "known." Even if fully funded, time is a precious commodity in most researcher's lives and few will want to spend so much of it on the unglamorous job of confirming the results of others--it only really makes sense to pursue when you have a strong hunch that a significant prior result is untrue. Most researchers I know have no shortage of ideas of their own they'd rather be exploring--the most likely takers of this funding are those who can't find funding for their new ideas and are desperate for anything that will keep their labs in business. I'm not sure that's a recipe for good science.

A real solution would require changing academic culture to shift some prestige from the latest and greatest "discoveries" to the day-in-day-out exercise of careful lab work. I'm not sure how that could best be realized.

The payment structure needs to insulate the reproducers from the payers. Possibly instead of having the Exchange act merely as a clearinghouse/ad agency, they should take the funding from the original researchers and add it to an overall fund pool, and then *they* should pick the reproducers and pay them from the fund pool.

That's actually what they're doing. Fro the FAQ: https://www.scienceexchange.com/reproducibility"Submitted studies are matched blindly to an appropriate Science Exchange provider for independent validation - you may not choose who reproduces your results. Once a study has been matched to a specific provider, you can communicate with the selected provider regarding details of your study’s methodology, protocols, and results."

Even if fully funded, time is a precious commodity in most researcher's lives and few will want to spend so much of it on the unglamorous job of confirming the results of others--it only really makes sense to pursue when you have a strong hunch that a significant prior result is untrue. Most researchers

To me, that's the genius of this initiative. You as the original author don't have the time or inclination, so you outsource that work to a commercial entity, whose only motivation is producing high quality work and getting paid for it. You get another publication out of it, the extra prestige that comes with the reproducibility stamp, and the public, including drug companies that might want to base a drug on your finding or funding agencies that want to know their grants are achieving their goals, will know that the work is robust. Funding agencies or drug companies might even start to put up money for this, and journals might start requiring it.

I should point out at this point that I work for Mendeley, who is one of the partners on the initiative. There's a Mendeley group which will provide exposure, group and article levels analytics, and discussion for the original and replicated publications submitted to the initiative. http://www.mendeley.com/groups/2473351/ ... nitiative/

Even if fully funded, time is a precious commodity in most researcher's lives and few will want to spend so much of it on the unglamorous job of confirming the results of others--it only really makes sense to pursue when you have a strong hunch that a significant prior result is untrue. Most researchers

To me, that's the genius of this initiative. You as the original author don't have the time or inclination, so you outsource that work to a commercial entity, whose only motivation is producing high quality work and getting paid for it. You get another publication out of it, the extra prestige that comes with the reproducibility stamp, and the public, including drug companies that might want to base a drug on your finding or funding agencies that want to know their grants are achieving their goals, will know that the work is robust. Funding agencies or drug companies might even start to put up money for this, and journals might start requiring it.

I should point out at this point that I work for Mendeley, who is one of the partners on the initiative. There's a Mendeley group which will provide exposure, group and article levels analytics, and discussion for the original and replicated publications submitted to the initiative. http://www.mendeley.com/groups/2473351/ ... nitiative/

This isn't right though is it?

People are meant to check reproducibility themselves before publication, so there is no extra prestige and the "extra paper" will be ignored.

Drug companies have more than enough soft funding to give out if they want to test a certain compound and funding agencies already know they are getting their money's worth from the first paper/papers.

Any journals requiring this will probably see a fall in submissions as it will increase the already tight financial and time obligations academics are under.

If we are honest about this we already know that people get ripped apart for publishing things they haven't shown to be reproducible, in subsequent papers and at conferences.

This feels like someone has found a way to try and milk money out of already overstretched academic budgets and I have to say I really don't like the idea of it.

People are meant to check reproducibility themselves before publication...

Unfortunately, what people are supposed to do and what they actually do are very different, hence the idea of giving a positive incentive for people to engage in the right behavior, and getting an extra validated publication with little extra work would be considered desirable by all the scientists I know.

patterson_hood wrote:

Drug companies have more than enough soft funding to give out if they want to test a certain compound and funding agencies already know they are getting their money's worth from the first paper/papers.

Sorry, but this isn't true at all. Drug companies are spending millions of dollars trying to reproduce published experiments and finding that quite often (greater than 70% in the two studied mentioned in the article) they can't. There's a clear motive for drug companies to be involved with this. As far as funders go, several of them have already publicly supported the initiative. Just look at the Advisory Board: https://www.scienceexchange.com/reprodu ... sory_board

To speak to the cost issue, core facilities are equipped to do the kind of work they do in volume and on a cost-recovery basis, so the cost of the replication will only be a fraction of the original cost of the study.

...People are meant to check reproducibility themselves before publication, so there is no extra prestige and the "extra paper" will be ignored.

...This feels like someone has found a way to try and milk money out of already overstretched academic budgets and I have to say I really don't like the idea of it.

Funny thing you should mention checking for reproducibility and strained academic budgets in the same post. I don't believe many researchers can afford checking beyond the absolute minimum required by the methodology. From what I've experienced, people are actually wary of verifying results too deeply because then seemingly interesting findings might turn out to be false positives prior to publication. Let's not forget that scientists get funded for papers, not for the objective truth.

From what I've experienced, people are actually wary of verifying results too deeply because then seemingly interesting findings might turn out to be false positives prior to publication. Let's not forget that scientists get funded for papers, not for the objective truth.

You've hit the nail right on the head! That's exactly the issue. the incentives are set up the wrong way and here's an opportunity to give some positive incentives for focusing on verifiable, reproducible truth as opposed to just novelty and interest.

A lot of behavioral studies are done on small populations of undergrads from a single university, and it's probably safe to assume there's a risk that undergrads in Beijing, Boston, and BYU could produce significantly different results.

and this is the problem with behavioral studies done on small sample-sized populations. they are not large enough to be statistically significant. it should be a requirement to replicate these small studies IMO, to see if they actually mean something, or are a result of sampling bias.

People are meant to check reproducibility themselves before publication...

Unfortunately, what people are supposed to do and what they actually do are very different, hence the idea of giving a positive incentive for people to engage in the right behavior, and getting an extra validated publication with little extra work would be considered desirable by all the scientists I know.

patterson_hood wrote:

Drug companies have more than enough soft funding to give out if they want to test a certain compound and funding agencies already know they are getting their money's worth from the first paper/papers.

Sorry, but this isn't true at all. Drug companies are spending millions of dollars trying to reproduce published experiments and finding that quite often (greater than 70% in the two studied mentioned in the article) they can't. There's a clear motive for drug companies to be involved with this. As far as funders go, several of them have already publicly supported the initiative. Just look at the Advisory Board: https://www.scienceexchange.com/reprodu ... sory_board

To speak to the cost issue, core facilities are equipped to do the kind of work they do in volume and on a cost-recovery basis, so the cost of the replication will only be a fraction of the original cost of the study.

I really don't see the incentive. People who do poor work will not use it and people who do good work don't have to. We all know who is good in our fields and who is shoddy. It's also usually fairly easy to work out, although I can only comment on my own field for this obviously.

Also, you say some journals may require this? So this won't be a new publication, it'll be the first one with even more overheads someone is going to have to pay for.

I can't see drug companies turning down promising compounds just because they don't have this reproducibility certification. The whole pharmaceutical industry is currently undergoing massive change as their big ticket compounds come of patent and everyone wants the next big thing. Sure, they might get behind it, but if people don't use it I doubt they will care.

As for the advisory board you'll have to help me out as I only see one person, Dr. Booth, and I'm not even sure he is involved in funding but guessed that based on his works name. Maybe the others are on boards for funding bodies but their short bios don't say so.

As for the costs, they may be small but so are academic budgets and that's money that could be used elsewhere.

I don't want to sound disagreeable but honestly I think this would either have to be forced on me or you'd need to show me a real incentive I'm currently missing.

...People are meant to check reproducibility themselves before publication, so there is no extra prestige and the "extra paper" will be ignored.

...This feels like someone has found a way to try and milk money out of already overstretched academic budgets and I have to say I really don't like the idea of it.

Funny thing you should mention checking for reproducibility and strained academic budgets in the same post. I don't believe many researchers can afford checking beyond the absolute minimum required by the methodology. From what I've experienced, people are actually wary of verifying results too deeply because then seemingly interesting findings might turn out to be false positives prior to publication. Let's not forget that scientists get funded for papers, not for the objective truth.

The minimum is still what is dictated by the currently acceptable experimental method though.

The problem is that people can't publish negative results, results that are I'm fact very valuable. This is what should really be addressed, not verifying positive results that should have been verified already.

This is a great idea to stop the exaggeration, and sometimes invention, of results.

Imagine if someone had actually tried to reproduce studies that purported to show that MMR vaccines caused autism. Or had reviewed the papers of the Japanese scientist who has allegedly been inventing stuff for the last two decades. Or reviewed drug trials that had been performed by the company trying to sell the product.

The concept that someone might come out with a paper within six months to a year blowing the previous results out of the water is a sound one based on what we've seen in the past, and would at the very least encourage more diligence by the original publisher.

The minimum is still what is dictated by the currently acceptable experimental method though.

The currently acceptable method often allows for a confidence interval of 95%, which leaves plenty of room for having a rubbish hypothesis supported by "statistically significant" data. People also do multiple hypothesis testing and fail to take that little detail into account.

Quote:

The problem is that people can't publish negative results, results that are I'm fact very valuable. This is what should really be addressed, not verifying positive results that should have been verified already.

I'm inclined to agree with you, but it's not really about "positive" or "negative" results, it's rather about trivial versus interesting findings. If one can show that a treatment is ineffective against an ailment when it is reasonable to expect that it would be effective, that's a valid result worth publishing.