A new R package for fititng multilevel models

Joscha Legewie points to this article by Lars Ronnegard, Xia Shen, and Moudud Alam, “hglm: A Package for Fitting Hierarchical Generalized Linear Models,” which just appeared in the R journal. This new package has the advantage, compared to lmer(), of allowing non-normal distributions for the varying coefficients. On the downside, they seem to have reverted to the ugly lme-style syntax (for example, “fixed = y ~ week, random = ~ 1|ID” rather than “y ~ week + (1|D)”). The old-style syntax has difficulties handling non-nested grouping factors. They also say they can estimated models with correlated random effects, but isn’t that just the same as varying-intercept, varying-slope models, which lmer (or Stata alternatives such as gllam) can already do? There’s also a bunch of stuff on H-likelihood theory, which seems pretty pointless to me (although probably it won’t do much harm either).

In any case, this package might be useful to some of you, hence this note.

Meng's "Decoding the H-likelihood" should be required reading before using the package. As I understand it, hglm methods are integrated nested laplace approximation methods without consideration of whether that was a good idea or not. In many cases INLA (and hglm) can work, but it's not a general procedure and knowing when it won't work is tricky.

Yes, a paper by Harry Joe [1] suggests that they are very good to get close, as long as you aren't on the boundary of the VC parameters. Also, if the hglm package implements the second-order laplace approximation for general designs [2] that would be great; the other glmm packages should steal that right away.

We would like to thank Andrew Gelman for commenting on our recent paper in The R Journal.

Concerning the syntax, there are two alternatives for the input syntax in hglm. One is the lme-style and the other is a design matrix style. The latter gives a possibility to have an a priori correlation between groups/clusters (as Andi from Germany points out). These kind of correlation structures appear frequently in genetics and there is a need to allow for such models in R.

We summarize previous discussions on h-likelihood theory in Section 10 (Discussion on h-likelihood theory) of our vignette available with the hglm package. This should be a good starting point for those interested in the theory. In the vignette we also develop further how the hglm package uses h-likelihood theory. Basically the package implements a set of inter-connected GLMs, which gives good approximations to the maximum h-likelihood estimates.

The R package HGLMMM developed by Marek Molas at Erasmus MC, Rotterdam, explicitly maximizes the h-likelihood and gives higher-order corrections, to the expense of being slightly slower.

We appreciate suggestions for further development of the hglm package.

Parenthesis matching in R is a big nuisance and is often the source of errors. A simpler syntax that uses fewer parentheses will reduce the number of coding errors. (Smart editors help, but do not eliminate this problem.)

R already has way too many characters that mean special things–this makes it hard for new users to read code. Every single special character on the U.S. keyboards is used by R in some fashion: ~ ` ! @ # $ % ^ & * ( ) { } [ ] : ; | / etc. After running out of single symbols, there's things like [[ %% %*% .( :: :::

Using () to denote random terms continues the infamous tradition of coming up with yet new uses of symbols, making the code ever harder to read. Look at the following two lines

For a new user (such as a person coming from PROC MIXED), which syntax is easier to understand? Which syntax has fewer parentheses to haggle with? Which syntax avoids the wretched implied intercept that even Bill Venables said is a difficulty for users?