Pages

Thursday, 1 December 2011

Questionable research practices, including testing increasing numbers of participants until a result is found, are the "steroids of scientific competition, artificially enhancing performance". That's according to Leslie John and her colleagues who've found evidence that such practices are worryingly widespread among US psychologists. The results are currently in press at the journal Psychological Science and they arrive at a time when the psychological community is still reeling from the the fraud of a leading social psychologist in the Netherlands. Psychology is not alone. Previous studies have raised similar concerns about the integrity of medical research.

John's team quizzed 6,000 academic psychologists in the USA via an anonymous electronic survey about their use of 10 questionable research practices including: failing to report all dependent measures; collecting more data after checking if the results are significant; selectively reporting studies that "worked"; and falsifying data.

As well as declaring their own use of questionable research practices and their defensibility, the participants were also asked to estimate the proportion of other psychologists engaged in those practices, and the proportion of those psychologists who would likely admit to this in a survey.

For the first time in this context, the survey also incorporated an incentive for truth-telling. Some survey respondents were told, truthfully, that a larger charity donation would be made by the researchers if they answered honestly (based on a comparison of a participant's self-confessed research practices, the average rate of confession, and averaged estimates of such practices by others). Just over two thousand psychologists completed the survey. Comparing psychologists who received the truth incentive vs. those that didn't showed that it led to higher admission rates.

Averaging across the psychologists' reports of their own and others' behaviour, the alarming results suggest that one in ten psychologists has falsified research data, while the majority has: selectively reported studies that "worked" (67 per cent), not reported all dependent measures (74 per cent), continued collecting data to reach a significant result (71 per cent), reported unexpected findings as expected (54 per cent), and excluded data post-hoc (58 per cent). Participants who admitted to more questionable practices tended to claim that they were more defensible. Thirty-five per cent of respondents said they had doubts about the integrity of their own research. Breaking the results down by sub-discipline, relatively higher rates of questionable practice were found among cognitive, neuroscience and social psychologists, with fewer transgressions among clinical psychologists.

John and her colleagues said that many of the iffy methods they'd investigated were in a "grey-zone" of acceptable practice. "The inherent ambiguity in the defensibility of research practices may lead researchers to, however inadvertently, use this ambiguity to delude themselves that their own dubious research practices are 'defensible'." It's revealing that a follow-up survey that asked psychologists about the defensibility of the questionable practices, but without asking about their own engagement in those practices, led to far lower defensibility ratings.

John's team think the findings of their survey could help explain the "decline effect" in psychology and other sciences - that is, the tendency for effect sizes to decline with replications of previous results. Perhaps this is because the original, large effect size was obtained via questionable practices.

The current study also complements a recent paper published in Psychological Science by Joseph Simons and colleagues that used simulations and a real experiment to show how toying with dependent variables, sample sizes and other factors (the kind of practices explored in the current study) can massively increase the risk of a false-positive finding - that is, claiming a positive effect where there is none.

"[Questionable research practices] ... threaten research integrity and produce unrealistically elegant results that may be difficult to match without engaging in such practices oneself," John and her colleagues concluded. "This can lead to a 'race to the bottom', with questionable research begetting even more questionable research."
_________________________________

15 comments:

Some of these really are very light grey grey-areas, IMO. For example "selectively reporting studies that work". I'd love to have time and opportunity to write up every study I did, but since I don't have time, and journals are less likely to accept non-results, i prioritise things that work. Note that the class of "non-working" studies is very diverse. Sometimes a failure to get an effect is interesting, many times you just did something wrong (in a very boring way).

["selectively reported studies that "worked" (67 per cent), not reported all dependent measures (74 per cent), continued collecting data to reach a significant result (71 per cent), reported unexpected findings as expected (54 per cent), and excluded data post-hoc (58 per cent)".]

Of these, I find the first one highly unlikely: are we to believe that 33% of psychologists have either i) never had a study that yielded insignificant results or ii) have managed to get all such studies published? That doesn't gel with common knowledge. And as pointed out by Tom, above, it is not always practical or even worthwhile to report every such study.

As for the rest, excluding data post-hoc is not wrong if you clearly state that you did so and honestly say why in your report, while reporting unexpected findings as expected is plain wrong and scientific mis-practice. Continuing to collect data until significance is a bit more of a grey area, particularly in areas of research where there's no clear way of estimating an effect size beforehand. But there's a good case for saying that such a practice should also be fully reported, so that reviewers could, for example, make a distinction between a study that was clearly underpowered in which the sample size was increased once to achieve reasonable power, versus a study in which the researchers continuously checked their stats with each new addition and stopped as soon as they achieved the magical 0.05.

The common factor, I believe, is that it's not so much a matter of what we do as scientists that can be questionable [leaving aside ethics of course], as what we report and how we report it. If we are clear and honest about exactly what we've done in our research, then the reviewers have all the necessary information with which to either accept our research, suggest modifications, or send us packing.

But seriously, it is very hard to interpret these results because (as Tom implies) many of the questionable practices are actually quite hard to avoid doing within the current academic publishing system.

Reporting negative results being the obvious one, but also you have things like peer reviewers suggesting you do post-hoc tests at the review stage...

Fundamentally I think studies like this one miss the point somewhat because it's not the fault of individual scientists (except in extreme cases), it is the fault of a broken system.

Thanks for your comments. I think the authors of the paper would agree with you. They say that some of the practices are worse than others (falsifying data obviously being the worst), they talk about "reforms" to the system, and they reference the Simmons paper, which contains some explicit recommendations for how to improve publishing habits and conventions, including: greater disclosure of things like research conditions, dependent variables, co-variates etc; and it calls for reviewers to be more tolerant of imperfections in results (thus encouraging more disclosure), but also to require that authors show their results don't hinge on arbitrary decisions etc.

As a partial remedy to the problem of individuals and journals not publishing negative results, we have created a repository, PsychFileDrawer mainly designed for non-replications of published results. The idea is that posting a notice (and possibly data) there is much less effort than publishing in a journal.

Your website provides a very useful tool/resource to the Psychology research community.

One potential future development might be agreement with some Psychology Journals to support the publication of meta-analyses based on studies reported in PsychFileDrawer. If a series of unsuccessful (or for that matter successful) replications of a previously published result were reported on your web site, then it would be nice for some or all of the contributing authors to write up a meta-analysis paper. If a few leading Journals encouraged such an endeavour, then there would be added incentive for researchers to report replications or non-replications, which can only be a good thing.

As a number of commentators have pointed out, this is largely a systemic problem rather than a matter of individual malpractice. To take the example of omitting to describe every dependent measure used (something that I am guilty of myself): this is partly because of space requirements in journals. Why waste words describing a measure that didn't come off (especially if, as tom points out, this was due to your own ineptitude rather than any theoretically interesting reason) when there are so many more interesting things to say about your results?

Journal space requirements are an anachronism in the digital age, and I hope we are now moving towards a situation where every empirical article will have two parts: a highly edited, word-limited summary write-up (ideally written in accessible English!), and an unedited "Supplementary Material" section for online-only publication in which authors write a full and detailed report of the entire research process that led to the article, including pilot studies, unsupported hypotheses, and even raw data (fully anonymised of course). This would go a long way towards improving transparency, as well as being an invaluable aid for students. It might need a bit of encouragement to get started, but this should be easy enough: a few influential journals just need to say that articles that include it will be more likely to get published. Once it becomes the norm it could be purely voluntary, since any article without it would be less likely to be taken seriously. To save time, brief articles could be provisionally published on condition that the Supplementary Material is added within (say) 6 months, and a "health warning" added to the article if this fails to appear by the deadline. But actually I would hope that people would write the full report first and then refine it into an article.

"this is largely a systemic problem rather than a matter of individual malpractice"

Yes, but it is rather disturbing that 2/3 of psychologists admit to using these questionable practices, yet only 1/3 admit to having doubts about the integrity of their research. Using some of the "lighter-grey" practices may well be justified; but shouldn't they at least give us pause for thought? If psychologists are not being self-critical about using them, it is very hard to be sure that they are not being influenced by implicit bias towards a priori theories. This would be much more damaging than if they are just being used randomly.

Also, I don't know why this kind of methodological self-criticism seems to be focussed on psychology. The root of the problem is nothing to do with psychology, it's a problem with any field of enquiry with the same "structure" in the sense that people do experiments, get data and look for (usually) group differences or correlations in those data.

It doesn't apply to say chemistry, theoretical physics or mathematics, and some biology is immune but most isn't.

In general I would say that if your headline results are in the form of p values, your science is vulnerable to this, because a) you won't publish p values >0.05 and b) you can jiggle the stats you use until you get one.

The larger the sample size, the more accurate the data. Any good research scientist knows this. So, it's not a surprise that a research scientist would try to collect more data after results aren't significant. It's after many trials, with a large enough sample, that one would have to start admitting defeat.

The focus of this article on psychologists is irresponsible. The DVs they measured and their methods are about as flawed as they are suggesting psychological science is. Incentives for truth? What is the objective measure of truth? They just incentivized reactivity effects. And since when is not reporting all DVs a flaw? It's called page limits in journal articles. As a published researcher and editor, we cannot publish all DVs measured. If we spent the time on attempting to publish all DVs and nonsig findings we'd be out of jobs. Way to make psychology the fall guy for systemic issues. Then again, the first author only has 3 pubs in pubmed so apparently is not the authority on any kind of science. Someone better donate to a charity for my comment.

Neuroskeptic said..."...I don't know why this kind of methodological self-criticism seems to be focussed on psychology...It doesn't apply to say chemistry, theoretical physics or mathematics, and some biology is immune but most isn't."

One problem with the "questionable research practices" used for research on ME or CFS patients during the last 20 years is that the questionable (sometimes fraudulent) results have discouraged funding of "...say chemistry, theoretical physics or mathematics, and some biology." So far, psychology has trumped biology. Perhaps that will change with the Rituximab studies coming out of Norway.

Neuroskeptic said..."...I don't know why this kind of methodological self-criticism seems to be focussed on psychology...It doesn't apply to say chemistry, theoretical physics or mathematics, and some biology is immune but most isn't."

One problem with the "questionable research practices" used for research on ME or CFS patients during the last 20 years is that the questionable (sometimes fraudulent) results have discouraged funding of "...say chemistry, theoretical physics or mathematics, and some biology." So far, psychology has trumped biology. Perhaps that will change with the Rituximab studies coming out of Norway.