Low-power pose

“The samples were collected in privacy, using passive drool procedures, and frozen immediately.”

Anna Dreber sends along a paper, “Assessing the Robustness of Power Posing: No Effect on Hormones and Risk Tolerance in a Large Sample of Men and Women,” which she published in Psychological Science with coauthors Eva Ranehill, Magnus Johannesson, Susanne Leiberg, Sunhae Sul, and Roberto Weber.

I can’t find a copy of the paper online but here’s the Open Science Framework page for the project, and here’s how the paper begins:

In a growing body of research, psychologists have studied how physical expression influences psychological processes . . . A recent strand of literature within this field has focused on how physical postures that express power and dominance (power poses) influence psychological and physiological processes, as well as decision making . . . Carney et al. found that power posing affected levels of hormones such as testosterone and cortisol, financial risk taking, and self-reported feelings of power in a sample of 42 participants . . .. We conducted a conceptual replication study with a similar methodology as that employed by Carney et al. but using a substantially larger sample (N = 200) and a design in which the experimenter was blind to condition. . . .

And here’s what they find:

Consistent with the findings of Carney et al., our results showed a significant effect of power posing on self-reported feelings of power. However, we found no significant effect of power posing on hormonal levels or in any of the three behavioral tasks.

I just have a couple of statistical comments:

1. Ranehill et al. write, “Our statistical power to detect an effect of the magnitude reported by Carney et al. was more than 95%.” Sure, but a key principle of design calculation (my preferred term, because I think that conventional “power” is unduly focused on statistical significance) is to hypothesize effect sizes using external information, not to simply use a published point estimate that is highly vulnerable to noise and selection.

I’m not saying Ranehill et al. did anything wrong in their analysis here, it’s just that it should be no surprise that this purportedly high-power study did not replicate, as the assumed power is coming from a biased and noisy effect size estimate.

2. After the non-replication, they write, “It is possible that subtle differences between the experimental protocols in Carney et al. and those in our study, originally designed as an extension of Carney et al., led to the omission of factors crucial for power poses to influence hormonal levels and behavior.” Let me just emphasize that just about all effects of interest vary across people and across scenarios. But when someone does a noisy study that fails to replicate in a larger sample, I have no reason, in general, to take that first result seriously.

By the way, in case you’re wondering, no, Anna Dreber is not some sort of a professional skeptic. The papers listed on her webpage include:

Were at least the standard errors smaller in the larger study? I always thought that would obviously be true, but it isn’t always so in practice, and this probably has to do with Andrew’s comments about increasing sample size not necessarily being enough of a solution.

I hope this isn’t consider OT for a stats blog but: “a design in which the experimenter was blind to condition” <– that is a huge deal. As far as I can tell the two studies may even be considered to be unrelated. One has blinded experimenter(s) and the original simply does not. There is a reason for double blinding medical trials…