It’s a snail but it seems to be moving in the right direction: On progress in the experimental social sciences …

I have previously written about the sorry state of the experimental (social) sciences (see here and here). Recent news about publishers having to withdraw (again) scores of papers in the wake of a rigged peer-review process (see here and here) and a flurry of retractions in psytown (e.g., Stapel, Sanna, Smeesters, Foerster; see also the excellent discussions on Rolf Zwaan’s blog) have not been encouraging. But there is good news …

Exhibit 1: Dixon & Jones recently published a reanalysis of survey data that allegedly showed that conspiracist ideation predicts scepticism regarding the reality of anthropogenic climate change. For a more detailed version of the Dixon & Jones reanalysis, see here.

The reanalysis demonstrates that, rather than assuming a linear relationship between ideation and views on climate science, Lewandowsky and his colleagues should have tested that assumption. Turns out, linearity is not a good assumption to make. Once non-linear regressions are used, the results – which are weak in effect size in the first place, as all parties now seem to agree – disappear, or so Dixon & Jones suggest: “The curvilinear relationship identified in (both survey data sets that Lewandowsky et al. used, AO) … suggests that both respondents convinced of anthropogenic climate change and respondents sceptical about such change were less likely to accept conspiracy theories than were those who were less decided about climate change.” Which makes intuitive sense to me. Lewandowsky et al. beg to differ but it seems that their logic is indeed a bit tortured; see also Dixon’s response to Lewandowsky’s response.

Of course, it should not take two years and four rounds of revision for some such critique to be published but given the sorry state of the experimental social sciences, it is progress that this critique was published at all. It unfortunately has become a bad habit of many journals to publish sensationalist findings but to categorically refuse to even look at deconstructions of these findings. This is an irresponsible practice, be it for the basic reason that the original findings typically tend to be cited by an order of magnitude more frequently. Which all in itself represents a serious problem of evidence production and evaluation and journals that engage in this practice ought to be named and shamed.

Although they arguably should show up in journals, replications and critical questions about particularly egregious pieces end up frequently on blogs and other social media such as facebook. One of the two Lewandowsky et al pieces, for example, had previously been savaged by a Ph.D. candidate at Arizona State University, Josef Duarte, who did not mince words — with which I do not agree — and called for one of the studies to be retracted. I guess the journal involved being Psychological Science, it ain’t going to happen. See also Ralf Zwaan’s pointed — and excellent — questions about another piece published in Psychological Science that somehow made it through the review process, when it clearly should not have.

Exhibit 2) The behavioural economics replication project, organized by 17 experimental and behavioral economists, some of them quite prominent. They intend to replicate 18 experimental studies published in the years 2011 to 2014 in the American Economic Review or the Quarterly Journal of Economics, two of the most prominent economics journals. The sample sizes were chosen so that the original result has a 90% chance of replication or higher, given a true effect size of the same magnitude as in the original study. A replication is here defined as a test statistic (using the same analysis in the original paper) with a hypothesis test p-value less than .05. In an interesting twist, the organizers of the replication project have also organized a prediction market, which is cool although by invitation only. It will be interesting to see what comes of both the replication project and the prediction market project.

Exhibit 3) The Erev et al. prediction competition, already up and running. The good news is that it is open to all and that its registration deadline was just extended by two weeks (registration now closing April 20). The authors previously organized such choice prediction competitions (which yours truly and a collaborator have discussed here …). In their new and improved competition, Erev and his co-authors have identified 14 well-known decision problems that have been used in the past to question the predictive power of Expected Utility Theory and which include experimental work-horses such as the Allais questions, the Ellsberg paradox, the St Petersburg paradox, etc. The organizers of the competition have also riffed on one of their favorite themes, the difference in experimental results derived from Decisions by Description and Decisions by Experience. They then replicated these choice “anomalies” under one “standard” setting: choice with real stakes in a space of experimental tasks wide enough to replicate all 14 “anomalies”.

The results of this replication (“Experiment 1”) suggested that all 14 phenomena emerge in their setting. “Yet, their magnitude tends to be smaller than their magnitude in the original demonstrations.” (This result is interesting since it speaks to the perennial issue of the effects of financial incentives for lab experiments. Had the stakes been higher, their magnitude might have been even less.)

In an “Experiment 2”, the organizers of the competition then randomly drew parameterizations for these 14 “anomalies” to address the critique that any one parameterization for one of the anomalies might lead to seriously biased results, in particular if selection of specific parameterizations was the result of careful pretesting meant to identify catchy results. This so-called estimation set has been published in detail and participants in the prediction competition proper (“Experiment 3”) can use these results to figure winning strategies.

I am happy to see these developments (paying attention to appropriate sample size, replication projects, tournaments, and so on) because past practices in experimental and behavioural economics have been dismal at best, with at least one item in the Bullshit Bingo chart making an appearance in most studies. And that’s not even taking into account proper contextualization of studies in the literature or some reflections on external validity even when authors almost routinely try to rationalize their studies by appealing to real-world problems.

Thanks for drawing attention to our work. One small correction is that our comment at Psychological Science is not pay-walled but is in fact open access, although the reply from Lewandowsky et al is pay-walled.

I think there’s a vast difference between the type of arguments that occurred with Lewandowsky’s paper and the sort of problems that Psych Science has (causes). In one case we are talking about statistical arguments that have at least some merit. In the other cases, we are talking about data fabrication, whizz-bang effects with media appeal but no theory, and simply poor quality control over statistical things that arn’t even worth arguing about like your observation about the creativity study. So I find Lewandowsky’s stuff misleading in terms of the real problems and the amount of press it seems to get has distracted from the main debates.

New Book: The Disruption Dilemma

New Book: The Economics of Multitasking

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.