Large Bayes Factor changes with exclusion of single subject (Bayesian ANOVA)

I have run both frequentist and Bayesian 2x2x3 repeated measures ANOVA's for my analysis of some reaction time data. At one point I noticed an interesting discrepancy between the two: in the frequentist analysis, a main effect (of SESSION) was not significant (p = 0.09), but it received fairly strong support in the Bayesian analysis (BF10 of only the model: 56, inclusion BF: 40).

This struck me as odd: I have had only encountered the opposite situation before (significant p-value, BF in favor of null). So I decided to look at the individual subject data (for a plot, see data.png file). It turns out there is one subject out of 26 that showed a much stronger effect than everyone else (one of the orange lines in the figure). I have no reason to exclude this subject, but because I'm still learning about Bayesian statistics I decided to try and see what would happen if I reran the analyses without this subject.

To my surprise, while the p-value changed only moderately (from 0.09 to 0.14), the Bayes factor plummeted from 57 to around 2. So by excluding just one subject I have gone from "strong" to "anecdotal" evidence for H1.

I don't really have a good intuition for Bayes factors yet, but this large change seems quite odd, especially compared to the frequentist analysis. Also, I am really in doubt how to interpret and report this effect now, if at all.

I'm left wondering under what circumstances such discrepancies between frequentist and Bayesian statistics can occur? Could it be that Bayes factors are indeed in a sense more sensitive to "outliers"?

Any input would be much appreciated. In case that's helpful, I've attached two .jasp files, each containing the frequentist and Bayesian ANOVAs either for all data (all.jasp) or for the dataset without this one subject (without.jasp).

Comments

That's interesting. I'll have to look at this a little later (some deadlines now, remind me if I haven't responded in a week), but some of the discrepancies may be due to violations of assumptions (homogeneity of variances). I'll take a look.

Hmm OK, that is a rather big effect of leaving out this single participant. Of course, you could also argue "how it is that the p-value only changes from .09 to .14 when I leave out this huge outlier in my relatively small data set?" but that's not constructive, and the change in the BF remains surprisingly large. To understand the problem better, perhaps you could break it down into a t-test. For a t-test, the t-value and sample size will determine the BF (hence the "Summary Stats module"). Do you find a similar discrepancy? (i.e., eliminate the noncrucial ANOVA variables and let's talk t-test instead)
E.J.

Thanks so much for the super swift reply! I agree that you could be equally surprised about the behavior of the p-value---it was more the discrepancy that struck me.

Interestingly, averaging over the other variables and just doing t-tests indeed leads to a completely different picture. The p-values are the same of course, but now the BFs in both cases are also modest and don't diverge much anymore:

Still, there thus remains a discrepancy between frequentist and Bayesian tests: for the former it doesn't matter how you specify the model, but for the latter it matters a lot.

I take it this is because the "Bayesian t-test" and "Bayesian ANOVA" are just descriptive names, but the underlying tests differ quite a lot? That is, the t-test/ANOVA are equivalent in the frequentist framework in the case of one factor with two levels, but this does not hold for the Bayesian analogs.

Would you say this means that (main) effects in Bayesian ANOVAs should generally be followed up with Bayesian t-tests? Or are the results of Bayesian t-tests and ANOVAs generally similar, which would mean that my data are just weird / violate certain assumptions?

Thank you so much for taking another look at my post. Indeed, Richard's simulation seems to behave differently, so it must be something particular about my data?

I've included the data (aov_data.csv) and the code that gets me these results (I mainly use the BayesFactor package directly; I only posted the JASP results first because I thought that would be more convenient). Sorry about the length; I wanted it to be reproducible and was unable to make it shorter. In case it's still not reproducible: The p and BF values I obtained are also included as comments inbetween the code.

Please let me know if there's any more information I can provide you with, and thanks again for all your help so far!