Psychology Rife with Inaccurate Research Findings

The case of a Dutch psychologist who fabricated experiments out of whole cloth for at least a decade is shining a spotlight on systemic flaws in the reporting of psychological research.

Diederik Stapel, a well-known and widely published psychologist in the Netherlands, routinely falsified data and made up entire experiments, according to an investigative committee.

But according to Benedict Carey of the New York Times, the scandal is just one in a string of embarrassments in "a field that critics and statisticians say badly needs to overhaul how it treats research results":

In recent years, psychologists have reported a raft of findings on racebiases, brain imaging and even extrasensory perception that have not stood up to scrutiny....

Dr. Stapel was able to operate for so long, the committee said, in large measure because he was "lord of the data," the only person who saw the experimental evidence that had been gathered (or fabricated). This is a widespread problem in psychology, said Jelte M. Wicherts, a psychologist at the University of Amsterdam. In a recent survey, two-thirds of Dutch research psychologists said they did not make their raw data available for other researchers to see. "This is in violation of ethical rules established in the field," Dr. Wicherts said.

In a survey of more than 2,000 American psychologists scheduled to be published this year, Leslie John of Harvard Business School and two colleagues found that 70 percent had acknowledged, anonymously, to cutting some corners in reporting data. About a third said they had reported an unexpected finding as predicted from the start, and about 1 percent admitted to falsifying data.

Also common is a self-serving statistical sloppiness. In an analysis published this year, Dr. Wicherts and Marjan Bakker, also at the University of Amsterdam, searched a random sample of 281 psychology papers for statistical errors. They found that about half of the papers in high-end journals contained some statistical error, and that about 15 percent of all papers had at least one error that changed a reported finding—almost always in opposition to the authors' hypothesis....

While inaccurate and even fabricated findings make the field of psychology look silly, they take on potentially far more serious ramifications in forensic contexts, where the stakes can include six-figure payouts or extreme deprivations of liberty.

For example, claims based on fMRI brain-scan studies are increasingly being allowed into court in both criminal and civil contexts. Yet, a 2009 analysis found that about half of such studies published in prominent scientific journals were so "seriously defective" that they amounted to voodoo science that "should not be believed."

Similarly, researcher Jay Singh and colleagues have found that meta-analyses purporting to show the efficacy of instruments used to predict who will be violent in the future are plagued with problems, including failure to adequately describe study search procedures, failure to check for overlapping samples or publication bias, failure to investigate the confound of sample heterogeneity, and use of a problematic statistical technique, the Area Under the Curve (AUC), to measure predictive accuracy.

Particularly troubling to me is a brand-new study finding that researchers' willingness to share their data is directly correlated with the strength of the evidence and the quality of reporting of statistical results. (The analysis is available online from the journal PloS ONE.)

I have heard about several researchers in the field of sex offender risk assessment who stubbornly resist efforts by other researchers to obtain their data for reanalysis. As noted by Dr. Wicherts, the University of Amsterdam psychologist, this is a violation of ethics rules. Most importantly, it makes it impossible for us to be confident about the reliability and validity of these researchers' claims. Despite this, potentially unreliable instruments -- some of them not even published -- are routinely introduced in court to establish future dangerousness.

Critics say the widespread problems in the field strongly support the need for mandatory reforms, including the establishment of policies requiring that researchers archive their data to make it available for inspection and analysis by others. This reform is important for the credibility of psychology in general, but absolutely essential in forensic psychology.

What exactly is the problem with the Area Under the Curve statistic? I'm not aware of any problems with the stat itself except for the sometimes erroneously way people COMPARE two incompatible ROC curves.

Anonymous,
Jay Singh and colleagues have been conducting a line of research into measurements of predictive accuracy. They find that the AUC statistic does poorly as a method for measuring the predictive accuracy of risk assessment instruments, when compared with other methods such a the Diagnostic Odds Ratio (DOR). I have more information on that research, as well as contact information for Dr. Singh in case you would like to request his article(s), at my professional blog (http://bit.ly/niKm0j). An upcoming issue of Behavioral Sciences and the Law will also address problems with the AUC statistic (see my blog announcement at http://bit.ly/tA6OXE).

It would also be nice if during a time of reform they addressed the file drawer problem that leads many researchers to lie about their results because otherwise they can't publish and can't keep their jobs.