pglmm: Phylogenetic Generalised Linear Mixed Model for Community...

Description

This function performs Generalized Linear Mixed Models for binary
and continuous phylogenetic data, estimating regression
coefficients with approximate standard errors. It is modeled after
lmer but is more general by allowing
correlation structure within random effects; these correlations can
be phylogenetic among species, or any other correlation structure,
such as geographical correlations among sites. It is, however, much
more specific than lmer in that it can
only analyze a subset of1 the types of model designed handled by
lmer. It is also much slower than
lmer and requires users to specify
correlation structures as covariance
matrices. communityPGLMM can analyze models in Ives and
Helmus (2011). It can also analyze bipartite phylogenetic data,
such as that analyzed in Rafferty and Ives (2011), by giving sites
phylogenetic correlations.

communityPGLMM.binary calculates the statistical
significance of the random effects in the generalized linear mixed
model from the marginal profile likelihood.

communityPGLMM.binary.LRT tests statistical significance of
the phylogenetic random effect on species slopes using a likelihood
ratio test

Arguments

formula

a two-sided linear formula object describing the
fixed-effects of the model; for example, Y ~ X.

data

a data.frame containing the variables
named in formula. The data frame should have long format with
factors specifying species and sites. communityPGLMM will
reorder rows of the data frame so that species are nested within
sites. Please note that calling
as.data.frame.comparative.comm will return your
comparative.comm object into this format for you.

family

either gaussian for a Linear Mixed Model, or
binomial for binary dependent data.

sp

a factor variable that identifies species

site

a factor variable that identifies sites

random.effects

a list that contains, for
non-nested random effects, lists of triplets of the form
list(X, group = group, covar = V). This is modeled after the
lmer formula syntax (X | group)
where X is a variable and group is a grouping
factor. Note that group should be either your sp or
site variable specified in sp and site. The
additional term V is a covariance matrix of rank equal to
the number of levels of group that specifies the covariances among
groups in the random effect X. For nested variable random
effects, random.effects contains lists of quadruplets of the
form list(X, group1 = group1, covar = V, group2 = group2)
where group1 is nested within group2.

REML

whether REML or ML is used for model fitting. For the
generalized linear mixed model for binary data, these don't have
standard interpretations, and there is no log likelihood function
that can be used in likelihood ratio tests.

s2.init

an array of initial estimates of s2 for each random
effect that scales the variance. If s2.init is not provided for
family="gaussian", these are estimated using in a clunky way
using lm assuming no phylogenetic signal. A better
approach is to run link[lme4:lmer]{lmer} and use the output
random effects for s2.init. If s2.init is not
provided for family="binomial", these are set to 0.25.

B.init

initial estimates of B, a matrix containing
regression coefficients in the model for the fixed effects. This
matrix must have dim(B.init)=c(p+1,1), where p is the
number of predictor (independent) variables; the first element of
B corresponds to the intercept, and the remaining elements
correspond in order to the predictor (independent) variables in the
formula. If B.init is not provided, these are estimated
using in a clunky way using lm or glm
assuming no phylogenetic signal. A better approach is to run
lmer and use the output fixed effects for
B.init.

reltol

a control parameter dictating the relative tolerance
for convergence in the optimization; see optim.

maxit

a control parameter dictating the maximum number of
iterations in the optimization; see optim.

tol.pql

a control parameter dictating the tolerance for
convergence in the PQL estimates of the mean components of the
binomial GLMM.

maxit.pql

a control parameter dictating the maximum number
of iterations in the PQL estimates of the mean components of the
binomial GLMM.

verbose

if TRUE, the model deviance and running
estimates of s2 and B are plotted each iteration
during optimization.

which random.effect in x to be
tested

...

ss

which of the random.effects to produce

object

communityPGLMM object to be summarised

digits

minimal number of significant digits for printing, as
in print.default

show.plot

if TRUE (default), display plot

Details

The vignette 'pez-pglmm-overview' gives a gentle
introduction to using PGLMMS. For linear mixed models (family
= 'gaussian'), the function estimates parameters for the model of
the form, for example,

y = beta_0 + beta_1x + b_0 + b_1x

b_0 ~ Gaussian(0, sigma_0^2I_(sp))

b_0 ~ Gaussian(0, sigma_0^2V_(sp))

e ~ Gaussian(0,sigma^2)

where beta_0 and beta_1 are fixed
effects, and V_(sp) is a variance-covariance matrix
derived from a phylogeny (typically under the assumption of
Brownian motion evolution). Here, the variation in the mean
(intercept) for each species is given by the random effect
b_0 that is assumed to be independent among
species. Variation in species' responses to predictor variable
x is given by a random effect b_0 that is
assumed to depend on the phylogenetic relatedness among species
given by V_(sp_; if species are closely related,
their specific responses to x will be similar. This
particular model would be specified as

The covariance matrix covar is standardized to have its determinant
equal to 1. This in effect standardizes the interpretation of the
scalar sigma^2. Although mathematically this is
not required, it is a very good idea to standardize the predictor
(independent) variables to have mean 0 and variance 1. This will
make the function more robust and improve the interpretation of the
regression coefficients. For categorical (factor) predictor
variables, you will need to construct 0-1 dummy variables, and
these should not be standardized (for obvious reasons).

For binary generalized linear mixed models (family =
'binomial'), the function estimates parameters for the model of
the form, for example,

y = beta_0 + beta_1x + b_0 + b_1x

Y = logit^(-1)(y)

b_0 ~ Gaussian(0, sigma_0^2I_(sp))

b_0 ~ Gaussian(0, sigma_0^2V_(sp))

where beta_0 and beta_1 are fixed
effects, and V_(sp) is a variance-covariance matrix
derived from a phylogeny (typically under the assumption of
Brownian motion evolution).

As with the linear mixed model, it is a very good idea to
standardize the predictor (independent) variables to have mean 0
and variance 1. This will make the function more robust and improve
the interpretation of the regression coefficients. For categorical
(factor) predictor variables, you will need to construct 0-1 dummy
variables, and these should not be standardized (for obvious
reasons).

predicted mean values for the generalized linear mixed model. Set to NULL for linear mixed models

sp, sp

matrices used to construct the nested design matrix

Zt

the design matrix for random effects

St

convcode

the convergence code provided by optim

niter

number of iterations performed by optim

Note

These function do not use a
comparative.comm object, but you can use
as.data.frame.comparative.comm to
create a data.frame for use with these functions. The power
of this method comes from deciding your own parameters parameters
to be determined (the data for regression, the random effects,
etc.), and it is our hope that this interface gives you more
flexibility in model selection/fitting.