The science of science itself is critically important. Improvements in our understanding of the world and our technological ability to affect it is arguably the strongest factor determining many aspects of our quality of life. We invest billions of dollars in scientific research, to improve medical practice, feed the world, reduce our impact on the environment, make better use of resources, to do more with less.

It seems obvious that it is in our best interest for that scientific research to be as efficient and effective as possible. Bad scientific research wastes resources, wastes time, and may produce spurious results that are then used to waste further resources.

This is why I have paid a lot of attention to studies which look at the process of science itself, from the lab to the pages of scientific journals. To summarize the identified problems: most studies that are published are small and preliminary (meaning they are not highly rigorous), and this leads to many false positives in the literature. This is exacerbated by the current pressure to publish in academia.

There is researcher bias – researchers want positive outcomes. It is easy to exploit so-called“researcher degrees of freedom” in order to manufacture positive results even out of dead-negative data. Researcher can also engage in citation bias to distort the apparent consensus of the published literature.

Traditional journals want to maximize their impact factor, which means they are motivated to publish new and exciting results, which are the one most likely to be false. Insufficient space is given to replications, which are critically important in science to know what is really real. We are also now faced with a large number of open-access journals with frightfully low standards, some with predatory practices, flooding the literature with low-grade science.

All of this biases published science in the same direction, that of false positive studies. In most cases the science eventually works itself out, but this arguably takes a lot longer than it has to, and scientists pursue many false leads that could have been avoided with better research up front.

Attention is being paid to this problem, although not enough, in my opinion. One specific intervention aimed at reducing false positive studies is pre-registration of clinical trials (at clinicaltrials.gov, for example). The idea here is that scientists have to register a scientific study on people before they start gathering data. This means they cannot simply hide the study in a file drawer if they don’t like the results. Further, they have to declare their methods ahead of time, including what outcomes they are going to measure.

Pre-registering scientific studies, therefore, has the effect of reducing researcher degrees of freedom. They cannot simply decide after they collect the data which outcomes to follow or which comparisons to make, in order to tease out a positive result. Does this practice actually work? The answer seems to be yes, according to a new study published in PLOS One: Likelihood of Null Effects of Large NHLBI Clinical Trials Has Increased over Time.

The researchers looked at 30 large National Heart Lung, and Blood Institute (NHLBI) funded trials between 1970 and 2000. Of those studies, 17 or 57% showed a significant positive result. They then compared that to 25 similar studies published between 2000 and 2012. Of those, only 2 or 8% were positive. That is a significant drop – from 57% to 8% positive studies.

They also found that there was no difference in the design of the studies, whether they were placebo-controlled, for example. There was also no effect from industry funding.

What was different was that starting in 2000 these trials had to be pre-registered in clinicaltrials.gov. Pre-registration strongly correlated with a negative outcome. In addition to pre-registration there was also the adoption of transparent reporting standards.

These results are simultaneously very encouraging and a bit frightening. This itself is one study, although it is fairly straightforward and the results clear, but it still needs to be replicated with other databases. Taken at face value, however, it means that at least half of all published clinical trials are false positives, while only about 10% are true positive, and 40% are negative (both true and false negative). Also keep in mind – these were large studies, not small preliminary trials.

This study seems to confirm what all the other studies I reviewed above appear to be saying, that loose scientific methods are leading to a massive false positive bias in the medical literature. The encouraging part, however, is that this one simple fix seems to work remarkably well.

Conclusion

This study should be a wake-up call, but it is not getting as much play in the media as I would hope or like. I do not go as far as to say that science is broken. In the end it does work, it just takes a lot longer to get there than it should because we waste incredible resources and time chasing false positive outcomes.

The infrastructure of doing and reporting science has significant and effective built-in quality control, but it is currently not sufficient. The research is showing glaring holes and biases in the system. In some cases we know how to fix them.

At this point there is sufficient evidence to warrant full requirement for all human research to be registered prior to collecting data, declaring methods and outcomes to be measured. We need high standards of scientific rigor with full transparency in reporting. These measures are already working.

We further need an overhaul of the system by which we publish scientific studies. There is too much of a bias in traditional journals toward exciting results that are unlikely to be replicated, and too little toward boring replications that are actually the workhorses of scientific progress. We also need to reign in the new open-access journals, weed out the predators, and institute better quality control.

With online publishing it is actually easier to accomplish these goals than before. Journals can no longer argue they don’t have “space” or that it is too expensive.

The scientific community, in my opinion, needs to pay more attention to these issues.

9 Responses to “Registering Studies Reduces Positive Outcomes”

“Taken at face value, however, it means that at least half of all published clinical trials are false positives, while only about 10% are true positive”

Steve – How do you know that it means that at least half of the studies are false positives. When in the second half of the study the positive rate went from 57% to 8%, couldn’t it just be that there were a lot more negative studies published because they were preregistered and couldn’t be deep sixed? This would drop the proportion of positives without indicating that the positive ones were false.

Wyse -it’s possible that publishing more negative studies has an effect, but it is likely minor. That is not what the study suggested, which is that researchers couldn’t change the outcomes they were reporting to select the data they can make look positive. Remember, this was looking at studies with a specific funding source. It would be hard to hide large numbers of negative studies.

Also, in order to have any significant effect on the positive rate the otherwise unpublished negative studies were have to be massive.

For example, in order to explain the entire decline in the positive rate by publishing more negative studies, then of the 25 studies looked at from 2000-2012, with two positive studies in order to have a 50% positive rate that would mean 2 negative studies would have been published, which means that 21 of those 25 studies would not have been published.

The most plausible explanation is simply a flip of studies from positive to negative.

Steve – Thanks for the explanation. I agree that if we want science based medicine to be respected and be the gold standard for evaluating the efficacy of treatments we have to clean up our own house and be our own toughest critics when it comes to reporting study results. Most doctors read journals but many only read the headlines or the abstracts without critically analyzing the methods and statistics. I know I am guilty of this although I am trying to learn this art better. Most of us simply don’t have the time. Thank you for trying to get to the root of the problem.

I was sure a while ago that increasing registration of studies would reduce the portion which came to a significant positive result. But, wow, a reduction from 57% to 8%? I am amazed. But, on the other hand – maybe the large size of the reduction is an artifact of the sample size of only 25. I do wonder if a “regression to the mean” type phenomenon will be seen when future larger studies are done.

How did we ever get to a point where negative studies were not seen as valuable? Unless I am misunderstanding the situation, it appears as though success bias has crept in to the process, and now millions of data points have now been discarded that could have been at the very least, used as guides for paths of research NOT to once again go down, because they have been examined, and had a null result. That is information, regardless of the fact that it didn’t pan out as the researchers had hoped.

Success is not built on success. It’s built on failure. It’s built on frustration. Sometimes its built on catastrophe.

I still have the same quibble as WyseMD. The rate of studies published from 1970 to 2000 was 1 per year. The rate from 2000 to 2012 was 2 per year. Another thing to consider is the number of drugs or devices being tested during those time frame. If more ideas were being tested (1979-2000) but the null results were not reported, I think that could be another factor. In fact, the pipelines in pharmaceutical companies have struggled in the last 15 years, due in part to the promise and failure of combinatorial chemistry.

tmac – “How did we ever get to a point where negative studies were not seen as valuable?”

I couldn’t agree more!! I’ve often wondered about this, as I agree it would help researchers avoid replicating studies that have already been done and failed to find effects. Part of the reason could be due to the nature of null-hypothesis significance testing (NHST). The way it is strictly interpreted is:

p .05 means there is insufficient evidence at this time to suggest a real effect (no support against null).

In other words, strictly speaking, from a stats perspective, failure to find an effect doesn’t mean there isn’t one hiding in there….perhaps for lots of reasons having to do with experimental design. So the way experiments are set-up generally, it makes it difficult to conclude “There is no effect here, go study something else” or “we are very confident in saying there is no effect here”.

So, null results are often interpreted as “nothing to see here” and thus, nothing worth publishing. I agree with this interpretation *statistically*, but as you correctly pointed out, it’s often important to know of previous studies failing to find effects, and to know how replications are panning out, etc. There is a practical vs. statistical conflict here, and seems to be a main reason people pick on NHST.