Andrew Gelman, a professor of statistics and political science at Columbia University who is arguably the statistics field’s biggest public intellectual.

And I thought you were the biggest methodological terrorist.

]]>By: Ben Goodrichhttp://andrewgelman.com/2017/10/11/partial-pooling-informative-priors-hierarchical-variance-parameters-next-frontier-multilevel-modeling/#comment-584620
Wed, 11 Oct 2017 18:31:58 +0000http://andrewgelman.com/?p=35410#comment-584620No, but I can’t say we have any actual research about that. Heavy-tailed priors on variances or standard deviations seem to not work as well as heavy-tailed priors on coefficients.
]]>By: Christopherhttp://andrewgelman.com/2017/10/11/partial-pooling-informative-priors-hierarchical-variance-parameters-next-frontier-multilevel-modeling/#comment-584538
Wed, 11 Oct 2017 16:05:50 +0000http://andrewgelman.com/?p=35410#comment-584538Would you suggest something like a horseshoe prior for the group variances (e.g., beta_group[i] ~ Normal(0,lambda_group * tau_global), where tau and the lambdas have half-Cauchy or half-t distributions)?
]]>By: Ben Goodrichhttp://andrewgelman.com/2017/10/11/partial-pooling-informative-priors-hierarchical-variance-parameters-next-frontier-multilevel-modeling/#comment-584514
Wed, 11 Oct 2017 15:29:08 +0000http://andrewgelman.com/?p=35410#comment-584514The stan_{g}lmer functions in the **rstanarm** R package use a Gamma (by default exponential) prior on the standard deviations of group specific terms like (1|A). But if you have (1|A) + (1|B) + … + (1|Z), you get 26 independent priors on the standard deviations rather than partial pooling. Ed Vul seems to be referring to something more like Andrew’s co-authored paper with Sophie Si ( http://www.stat.columbia.edu/~gelman/research/unpublished/modelweighting.pdf ).

However, if you do something like (1 + x1 + x2 | g), then there is a 3×3 covariance matrix to estimate. Stan developers have long encouraged decomposing a covariance matrix into a correlation matrix and standard deviations and more recently be encouraging people to decompose the correlation matrix into its Cholesky factors. But people are still putting independent (often half-Cauchy, which is dubious) priors on the standard deviations, whereas **rstanarm** has always put a Dirichlet priors on the proportions of the unknown trace of the covariance matrix. By setting the concentration hyperparameter of the Dirichlet distribution to some number greater than 1, you can encourage the variances to be similar to each other. The unknown trace is set equal to the size of the matrix multiplied by the square of a scale parameter, which has a Gamma prior.