If one assumes that group = subjects (which is quite often the case in factorial experiments) this reductions looks (in my eyes) very similar to the reduction from m3 to m4 in Henrik Singmann’s post. As m4 corresponds to the RM-ANOVA model (which implicitly assumes compound symmtry/sphericity) I assume that the equivalence of m1 and m2 given compound symmetry is the same as the equivalence of m3 and m4 given compound symmetry:

The slides are a little bit old, so I am not sure if what Doug Bates said in 2008 still holds in the current lmer version. However, it is nevertheless worth while to think about what each model actually estimates. Let us do this with some random data generated based on a stats.stackexchange question from some time ago.

Somewhat surprisingly, these models all provide essentially the same account (if we are graceful with m6). The same holds if we look at all models that include the correlation. They are all the same as well.

In general it should holds that all models without the correlation provide the same account and all models with correlation provide the same account (as long as REML = FALSE for both cases), independent of their exact parameterization (i.e., set_sum_contrasts plays no role here; this is only relevant for interactions of fixed effects). I am not sure what is happening with m6 here, but let us ignore this for now. I think from this it follows that m1 and m2 are equivalent, if the correlation is 0. I am not sure if this is implied by Doug Bates quote, but is what I take from it.

Independent of the parameterization, models with two variances allow each individual level of group to have a random intercept and each level of group to have their own idiosyncratic effect of factor. And these effects are uncorrelated. If we now add the correlations, we get a slightly better fit because they do not appear to be uncorrelated.

Interestingly, this also holds if we compare m6 (without correlation) with the equivalently parameterized model m3 (with correlation):

For models estimated with maximum likelihood estimation the parameterization does in general not matter (this obviously breaks down in certain cases as seen for m6 because of differences in hierarchical shrinkage, but is approximately true).

In contrast, for models estimated with restricted maximum likelihood estimation (i.e., REML = TRUE) the random effect estimates (i.e., the so-called conditional modes which are not actual parameters here) are obtained independently from the fixed effects estimates. If I recall correctly, the random effects are estimated first and then the fixed effects. Here the differences in hierarchical shrinkage play a substantively larger role as they are estimated independently and we basically always see differences due to parameterization. Note that in the grand scheme of things these differences are still comparatively small.

The individual deviations or offsets estimated for each random effect parameter are estimated to be normally distributed around the fixed effect parameter, given the parameterization of the model. For example, for a model with treatment contrasts the offsets for the intercept (i.e., the random intercept) are distributed around the first factor level. The offsets for the random slopes are distributed around the deviation from the first factor level. In contrast, for a model with contr.sum parameterization the random intercepts are distributed around the grand mean and the random slopes are distributed around the differences from the grand mean.

These differences in parameterization can lead to different results if shrinkage is taken into account. For example, the estimate for a participant with an extreme effect for the first factor level would receive comparatively large shrinkage for the random intercept with treatment contrasts. This would then affect all other random effect estimates as well (as the intercept is always included). With a different parameterization the shrinkage on the random intercept would be lower and the other factor levels thus less influenced by the large outlier for one case.