Table of Contents

Testing Factors and Linear Combinations of Parameters

The examples below show how to test factors and linear combinations of parameters in (mixed-effects) meta-regression models.

Testing Factors

Meta-regression models can easily handle categorical predictors (factors) via appropriate coding of factors in terms of dummy variables. We can use the dataset for the BCG vaccine meta-analysis (Colditz et al., 1994) as an illustration.

The (log) risk ratios and corresponding sampling variances for the 13 studies can be obtained with:

For easier interpretation of the results, we will "center" the publication year and absolute latitude moderators at their minimum values (1948 and 13 degrees, respectively):

dat$year <- dat$year -1948
dat$ablat <- dat$ablat -13

Now let's fit a mixed-effects meta-regression model to these data, including publication year (year), absolute latitude (ablat), and the method of treatment allocation (alloc) as predictors/moderators.1) The standard way of handling a factor in regression models is to create dummy variables representing the various levels. One of the dummies is then left out of the model, with the corresponding level becoming the "reference" level. Instead of creating the dummy variables manually, we can let R handle the dummy coding of the allocation factor for us:

By default, alternate was made the reference level (since the first letter "a" comes before "r" and "s"), so the corresponding dummy variable for this level has been left out.2) Therefore, the model intercept (coefficients $b_0$) is the estimated (average) log risk ratio for alternate allocation in 1948 at 13 degrees absolute latitude. Coefficients $b_1$ and $b_2$ indicate how much lower/higher the estimated (average) log risk ratio is for random and systematic allocation, respectively, compared to alternate allocation. Finally, the coefficients for year and absolute latitude (i.e., $b_3$ and $b_4$) estimate how the (average) log risk ratio changes for a one-unit increase in the respective moderator.

The results under "Test of Moderators" indicate that we can (just) reject the null hypothesis $H_0: \beta_1 = \beta_2 = \beta_3 = \beta_4 = 0$ (i.e., this is a so-called omnibus test of coefficients 2 through 5 – the first being the intercept, which is typically denoted as $\beta_0$ in model equations and is not included in the test by default). In other words, it appears as if at least part of the heterogeneity in the true effects is related to some of the predictors/moderators included in the model.

Now consider the two coefficients for the allocation factor. Neither coefficient is significantly different from 0 (as indicated by the corresponding p-values), but this isn't a proper test of the factor as a whole. Instead, we want to simultaneously test whether both coefficients are simultaneously equal to 0. We can do this with:

This is a (Wald-type) chi-square test of the null hypothesis $H_0: \beta_1 = \beta_2 = 0$ with two degrees of freedom. It indicates that the factor as a whole is not significant.

The car package includes the linearHypothesis() function that can be used for the same purpose. One can either specify a matrix (or vector) of one or more linear combinations of coefficients to test or a character vector giving the hypothesis in symbolic form. Therefore, both of the following commands produce the same output:

Likelihood Ratio Tests

We can also use likelihood ratio tests (LRT) to test sets of model coefficients. For this, we fit both the model with and the model without the treatment allocation factor (i.e., the "full" and the "reduced" model) and then conduct a model comparison. When the two models to be compared differ in terms of their fixed effects (i.e., model coefficients) – as in the present example – then one must fit both models using maximum likelihood (ML) estimation, and not restricted maximum likelihood (REML) estimation (which is the default). So, the LRT can be conducted with:

In the present example, the test statistic for the LRT is equal to $1.4510$, which (asymptotically) follows a chi-square distribution with two degrees of freedom (i.e., the difference in the number of parameters in the full and the reduced model) under the null hypothesis (i.e., that the reduced model fits the data as well as the full model). This leads to a p-value of $.48$, which does not lead us to reject the null hypothesis and hence to the conclusion that the inclusion of the allocation factor in the model does not provide a significantly better fit.

Knapp & Hartung Adjustment

Knapp and Hartung (2003) described an adjustment to the standard Wald-type tests typically used in mixed-effects meta-regression models that leads to tests with better statistical properties (i.e., better control of the Type I error rate). For individual model coefficients, the method leads to t-test for individual model coefficients. The adjustment can also be applied when testing sets of model coefficients, which leads to F-tests. In the present example, the allocation factor can be tested with:

We obtain $F(2,8) = .65, p = .55$, which yields the same conclusion as obtained earlier.

Here, the btt argument was used directly to only include the two coefficients corresponding to the allocation factor in the omnibus test (with the results again given under "Test of Moderators"). Equivalently, the same results could have been obtained with (output not shown):

Again, it only makes sense to request an F-test here when the original model was fitted with test="knha".

Permutation Test

Yet another possibility is to conduct a permutation test (Higgins & Thompson, 2004), which is implemented in the permutest() function. For this, the rows of the model matrix are permuted, so that any association between the predictors and the observed outcomes is purely due to chance. This process is repeated many times (ideally for all possible unique permutations), yielding the distribution of the test statistic under the null hypothesis. For the omnibus test, the p-value is the proportion of times that the test statistic under the permuted data is as extreme or more extreme than the actually observed one.

We can try to carry out an exact permutation test (for all possible unique permutations) with:

So, the exact test would require more than $6 \times 10^9$ model fits, which is just not feasible (even on a fast computer, this could easily take years to complete). So, instead, we can approximate the exact permutation-based p-value by going through a smaller number (as specified by the iter argument) of random permutations (the default is 1000). Let's go with 10000 iterations (this may still take a minutes or so to complete):

As another example, we could test whether the estimated average log risk ratio based on random allocation in 1970 at an absolute latitude of 30 degrees is significantly different from zero. Recall that these two moderators were "centered" at 1948 and 13, respectively. So, this linear hypothesis can be tested with:

In addition, the corresponding standard error, confidence interval, and credibility/prediction interval are given. Note that the intercept is automatically added (by default) and hence should not be included in the vector specified for the newmods argument for the predict() function.

Since the F-value given for the test above is based on one degree of freedom in the numerator, the value must in fact be identical with the squared ratio of the predicted value to its standard error:

(tmp$pred/tmp$se)^2

[1] 14.14316

As we see, that works out as expected.

All Pairwise Comparisons

Finally, when including a factor in a model, one may be interested in obtaining all pairwise comparisons (contrasts) between the factor levels. Let's start with the model without the Knapp and Hartung adjustment.

Note that coefficients $b_1$ and $b_2$ are already contrasts (between random and alternate allocation and between systematic and alternate allocation). The contrast between systematic and random allocation is just $b_2-b_1$ (the direction here is arbitrary, but we will use this to make things match up further below).

These are univariate confidence intervals (that have 95% coverage probability for each contrast individually). Simultaneous (family-wise) confidence intervals can also be obtained by leaving out the calpha=univariate_calpha() argument (these will have 95% coverage probability for all three contrasts simultaneously).

We can also use the predict() function to obtain the CIs (and also the credibility/prediction intervals). Recall that, by default, the intercept is automatically added to the vector(s) specified via the newmods argument (when the model actually includes an intercept), which we do not want to happen here. So, we use:

This will again yield the same results as above. An advantage is that the contrasts are labeled more informatively in the output.

The p-values for the contrasts can be adjusted using various methods (see help(summary.glht) and help(p.adjust) for possible options). For example, Holm's method is often a good choice, as it provides strong control of the family-wise error rate, but is not as overly conservative as the Bonferroni correction. We can obtain the adjusted p-values with:

Since alloc is a character variable, the coercion to a factor via the factor() function isn't strictly necessary (since this would have happened automatically anyway), but it's better to be explicit. In some cases, a numeric variable may in fact be used to represent levels of a categorical variable, in which case use of the factor() function is crucial.

When running the permutation test with REML estimation (or some other iterative estimation method, such as ML or EB), there may be the occasional warning that the "Fisher scoring algorithm did not converge". Unless this happens a lot, this can be safely ignored. Alternatively, one could also fit the model with one of the non-iterative methods (such as DL), which will avoid non-convergence issues altogether.