When news broke this year that Diederik Stapel, a prominent Dutch social psychologist, was faking his results on dozens of experiments, the fallout was swift, brutal and global.

Science and Nature, the world’s top chroniclers of science, were forced to retract papers that had received wide popular attention, including one that seemed to link messiness with racism, because “disordered contexts (such as litter or a broken-up sidewalk and an abandoned bicycle) indeed promote stereotyping and discrimination.”

As a result, some of Prof. Stapel’s junior colleagues lost their entire publication output; Tilburg University launched a criminal fraud case; Prof. Stapel himself returned his PhD and sought mental health care; and the entire field of social psychology — in which human behaviour is statistically analyzed — fell under a pall of suspicion.

Related

One of the great unanswered questions about the Stapel affair, however, is how he got away with such blatant number-fudging, especially in a discipline that claims to be chock full of intellectual safe-guards, from peer review to replication by competitive colleagues. How can proper science go so wrong?

The answer, according to a growing number of statistical skeptics, is that without release of raw data and methodology, this kind of research amounts to little more than “‘trust me’ science,” in which intentional fraud and unintentional bias remain hidden behind the numbers. Only the illusion of significance remains.

S. Stanley Young and Alan Karr of the US National Institute of Statistical sciences, for example, point to several shocking published claims that were not borne out by the data on which they were based, including coffee as a cause of pancreatic cancer, Type A personality causing heart attacks, and eating breakfast cereal increasing the odds that a woman will give birth to a boy.

“The more startling the claim, the better,” they wrote in a recent issue of the journal Significance. “These results are published in peer-reviewed journals, and frequently make news headlines as well. They seem solid. They are based on observation, on scientific method, and on statistics. But something is going wrong. There is now enough evidence to say what many have long thought: that any claim coming from an observational study is most likely to be wrong – wrong in the sense that it will not replicate if tested rigorously.”

Victor Ivrii, a University of Toronto math professor, described the problem similarly on his blog: “While Theoretical Statistics is (mainly) a decent albeit rather boring mathematical discipline (Probability Theory is much more exciting), so called Applied Statistics is in its big part a whore. Finding dependence (true or false) opens exciting financing opportunities and since the true dependence is a rare commodity many “scientists” investigate the false ones.”

“If jumping to wrong conclusions brings a scorn of colleagues and a shame, they will be cautious. But this does not happen these days,” Prof. Ivrii said in an email. “Finding that eating cereals does not affect your cardio [for example] brings neither fame nor money, but discovering that there is some connection allows you to apply for a grant to investigate this dependence.”

Science, at its most basic, is the effort to prove new ideas wrong. The more startling the idea, the stronger the urge to disprove it, as was illustrated when European physicists last month seemed to have seen particles travel faster than light, which has prompted a massive effort to replicate (or more likely debunk) such a shocking result.

Although science properly gets credit for discovery and progress, falsifiable hypotheses are its true currency, and when scientists fail to disprove a false hypothesis, they are left with a false positive.

Technically known as the incorrect rejection of the null hypothesis, a false positive is “perhaps the most costly error” a scientist can make, according to a trio of leading American researchers who this fall published “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.”

At worst, correcting a false positive can cost lives and fortunes. At best, it is a distraction. In the paper, the authors argued that modern academic psychologists have so much flexibility with numbers that they can literally prove anything. False positivism, so to speak, has gone rogue.

By seeming to prove, through widely accepted statistical methods, an effect that could not possibly be real, the authors vividly illustrated the problem. In “many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not,” they wrote.

Psychology has often been especially insecure as an objective science, far removed from its roots in philosophy. It strikes this pose with statistics, which seem objective, but can be interpreted either well or poorly. With nothing else but theory to fall back on, psychology is particularly vulnerable to the illusion of significance.

Critics point to the prevalence of data dredging, in which computers look for any effect in a massive pool of data, rather than testing a specific hypothesis. But another important factor is the role of the media in hyping counter-intuitive studies, coupled with the academic imperative of “publish or perish,” and the natural human bias toward positive findings — to show an effect rather than confirm its absence.

Even in the Stapel case, his exposure as a fraud was covered less extensively than some of his bogus claims.

In a much cited article in the New Yorker last year, which highlighted problems in the scientific method, Jonah Lehrer wrote: “Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.”

In their paper, Mr. Young and Mr. Karr shoot down this radical notion that science is a matter of personal choice, and proof a luxury. They point out that the example used in the magazine — the “decline effect,” seen in studies of paranormal extra-sensory perception, in which an initially high success rate drops off steeply, later explained by the statistical concept of regression to the mean — was simply “wrong and therefore should not be expected to replicate.”

It might not be the most exciting topic for the faculty lounge, and reporters might ignore it, but at the end of the day, a debunked false claim remains one of the highest achievements in science.