Preface: Debunkers criticised the stats in that quantum ESP paper but I’m not sure they’re any worse than in similar papers. To explore this, I’m reading the Journal of Personal and Social Psychology in sequence. All my readings will be quick, so I expect to use the right to be wrong, but please allow that I am arguing in good faith.

Experiment 1:

Design: “Participants were randomly assigned to read about one or eight children from Darfur. Half of these participants were given the expectation that they would have to report a donation amount later in the experiment, whereas the other half were told that they would just be asked to rate their emotions toward the children. The resulting design was a 2 (number of victims) × 2 (expectation to help) between-subjects design. The critical dependent variable was self-reported emotion toward the children.”

Results: “A two-way between-subjects analysis of variance (ANOVA) was conducted to examine the effects of help request and number of victims on compassion. There were no significant main effects of help request, F(1, 116) = 1.15, p = .29, [eta]p2 = .01, or number of victims, F(1, 116) = 0.34, p = .56, [eta]p2 = .00. However, there was a significant interaction between help request and number of victims, F(1, 116) = 4.61, p = .03, [eta]p2 = .04.”

What statisticians might dislike: Talking about interactions when the main effects aren’t significant. If you’re modelling interactions in a 2×2 table, you’re estimating four separate categories, and should do tests that reflect this. I don’t have the data, but the graph suggests all four 95% CIs overlap, leading me to question the existence of significant differences.

Other quibbles: If you have to throw out outliers to make the data normal, the data aren’t normal. The F-test is not your friend.

Obligatory disclaimer: I like psychology! I just want the statistics to be better.