External Links

Table of Contents

Models with Multiple Factors and Their Interaction

The example below shows how to test/examine multiple factors and their interaction in (mixed-effects) meta-regression models.

Data Preparation

For the example, we will use the data from the meta-analysis by Raudenbush (1984) (see also Raudenbush & Bryk, 1985) of studies examining teacher expectancy effects on pupil IQ (help(dat.raudenbush1985) provides a bit more background). The data can be loaded with:

I copy the dataset into 'dat', which is a bit shorter and therefore easier to type further below.

For illustration purposes, we will categorize the weeks variable (i.e., the number of weeks of contact prior to the expectancy induction) into three levels (0 weeks = "none", 1-9 weeks = "some", and 10+ weeks = "high"):

Variable yi provides the standardized mean differences and vi the corresponding sampling variances for these 19 studies. For this illustration, we will focus on variables weeks (as described above) and tester (whether the test administrator was aware or blind to the researcher-provided designations of the children's intellectual potential) as possible moderator variables.

The cut() command as used above automatically turns the weeks variable directly into a factor. Moreover, the none level is automatically set as the reference level (followed by levels some and high). To turn the tester variable into a factor and to ensure that the blind level becomes the reference level in the models to be fitted below, we can use the following command:

Let's start with the table at the bottom containing the model results. The first coefficient in the table is the model intercept ($b_0 = .4020$), which is the estimated average standardized mean difference for levels none and blind of the weeks and tester factors, respectively. For this combination of levels, the effect is significantly different from zero ($p = .0020$), although the 95% confidence interval ($.1472$ to $.6567$) indicates quite some uncertainty how large the effect really is.

The second and third coefficients correspond to the dummy variables weekssome and weekshigh ($b_1 = -.2893$ and $b_2 = -.4422$, respectively). These coefficients estimate how much higher/lower the average effect is for these two levels of the weeks factor, compared to the reference level (i.e., none). Therefore, these coefficients directly indicate the contrast between the none and the other two levels. Both coefficients are significantly different from zero ($p = .0334$ and $p = .0025$, respectively), so the average effect is significantly different (and in fact lower) when teachers had some or a high amount of contact with the students prior to the expectancy induction (instead of no contact).

Note that we have not yet tested the weeks factor as a whole. To do so, we have to test the null hypothesis that both coefficients are equal to zero simultaneously (see also the illustration of how to test factors and linear combinations of parameters in meta-regression models). We can do this with:

This is a Wald-type chi-square test with 2 degrees of freedom, indicating that the factor is in fact significant ($p = .0103$).

We have also not yet tested whether there is a difference between levels some and high of this factor. One could change the reference level of the weeks factor and refit the model to obtain this test. Alternatively, and more elegantly, we can just test the difference between these two coefficients directly. We can do this with:

So, we actually do not have enough evidence to conclude that there is a significant difference between these two levels ($p = .1323$).

The glht() function from the multcomp package also allows for such tests and actually makes it easy to conduct all pairwise comparisons between factor levels (with or without adjusted p-values due to multiple testing). To test all three linear combinations against each other, we would use:

No adjustment to the p-values has been used for these tests (test=adjusted("none")), but other options are possible for the test argument (see help(summary.glht) and the illustration of how to test factors and linear combinations of parameters). Note that the p-values for the three tests above are exactly the same as the corresponding two p-values from the model table (coefficients 2 and 3) and the one obtained with the linearHypothesis() function above.

Finally, the last (4th) coefficient in the model table corresponds to the dummy variable testeraware ($b_3 = -.0511$) and estimates how much higher/lower the average effect is for studies where the test administrator was aware of the children's designation, compared to studies where the test administrator was blind. So, again, this is a contrast between these two levels. The coefficient is not significantly different from zero ($p = .5815$), so this factor does not appear to account for (some of the) heterogeneity in the effects.

The output includes some additional results. First of all, the results given under Test of Moderators (coefficient(s) 2,3,4) is the omnibus test of coefficients 2, 3, and 4 (the first being the intercept, which is not included in the test by default). This Wald-type chi-square test is significant ($Q_M = 9.97, df = 3, p = .02$), so we can reject the null hypothesis that all of these coefficients are equal to zero (which would imply that there is no relationship between the size of the effect and these two factors). Moreover, the pseudo-$R^2$ value given suggests that roughly 50% of the total amount of heterogeneity is accounted for when including these two factors in the model. Finally, the Test for Residual Heterogeneity just falls short of being significant ($Q_E = 24.14, df = 15, p = .06$), which would suggest that there is no significant amount of residual heterogeneity in the true effects when including these two factors in the model.

Based on the fitted model, we can also obtain the estimated average effect for each combination of the factor levels. For this, we can use the predict() function, specifying how the dummy variables should be filled in. For levels none, some, and high of the weeks factor and level blind of the tester factor, we would use:

Note that, by default, the intercept is automatically included in the calculation of these predicted values (so only a vector of length 3 or a matrix with 3 columns should be specified via the newmods argument). The values under ci.lb and ci.ub are the bounds of the 95% confidence intervals, while the values under cr.lb and cr.ub are the bounds of the 95% credibility/prediction intervals (see help(predict.rma) for more details).

We can use an alternative model specification, where we leave out the model intercept:

Note that the first three coefficients now directly correspond to the estimated average effects for levels none, some, and high of the weeks factor and level blind of the tester factor. The last (4th) coefficient again indicates how much higher/lower the average effect is when the level of the tester factor is changed to aware. Also, note that the omnibus test now includes all four coefficients by default and has a different interpretation: It is essentially testing the null hypothesis that the true average effect is zero across all factor level combinations.

The model including only the main effects of the two factors assumes that the influence of the two factors is additive. We can illustrate the implications graphically by plotting the estimated average effects for all factor level combinations:

Note how the lines are parallel to each other, as expected under this model. Also, the tester factor appears to have only a minor influence on the effect as compared to the weeks factor. Finally, while we have treated weeks as a factor, the decrease in the estimated average effect appears roughly linear as we move from the none to the some and high levels. Therefore, one could also consider fitting a model where this variable is treated as a quantitative covariate. An example along those lines can be found in the discussion on how to test factors and linear combinations of parameters in mixed-effects meta-regression models.

Model with Interaction

Now let's move on to a model where we allow the two factors in the model to interact (note that it is questionable whether such a complex model should even be considered for these data; again, these analyses are for illustration purposes only). Such a model can be fitted with:

As before, the first coefficient in the table is the model intercept ($b_0 = .3067$), which is the estimated average effect for levels none and blind of the weeks and tester factors, respectively. The second and third coefficients ($b_1 = -.1566$ and $b_2 = -.3267$, respectively) indicate how much higher/lower the estimated average effect is for levels some and high of the weeks factor when the level of the tester factor is held at blind. The fourth coefficient ($b_3 = .1759$) provides an estimate of how much higher/lower the average effect is for the aware level of the tester factor when the level of the weeks factor is held at none. Finally, the last two coefficients ($b_4 = -.2775$ and $b_5 = -.2634$, respectively) indicate how much higher/lower the contrast between the aware and blind levels of the tester factor are for levels some and high of the weeks factor (or equivalently, how much higher/lower the contrasts between the some and the none levels and the high and the none levels of the weeks factor are for level aware of the tester factor).

These last two coefficients allow the influence of the aware factor to differ across the various levels of the weeks factor (and vice-versa). Therefore, if these two coefficients are in reality both equal to zero, then there is in fact no interaction. So, in order to test whether an interaction is actually present, we just need to test these two coefficients, which we can do with:

The results from the LRT ($\chi^2 = 0.67, df = 2, p = .71$) lead to the same conclusion.

For illustration purposes, let us however proceed with a further examination of this model. As before, the anova(), linearHypothesis(), and glht() functions could be used to test factor levels against each other and to obtain pairwise comparisons. Also, the predict() function can be used to obtain the estimated average effect for particular factor level combinations. For example, to obtain the estimated average effect for level high of the weeks factor and level aware of the tester factor, we would use:

Now, the table with the model results directly provides the estimated average effect for each factor level combination. It is now also quite easy to test particular factor level combinations against each other. For example, to test the difference between levels some and high of the weeks factor within the blind level of the tester factor, we could use:

If one were inclined to do so, one could even test all 6 factor level combinations against each other (i.e., all possible pairwise comparisons). The glht() function combined with the contMat() function makes this relatively easy:

Now, the lines are no longer constrained to be parallel and in fact cross each other (but as we saw before, there is hardly any evidence to suggest that there is a significant interaction).

References

Raudenbush, S. W. (1984). Magnitude of teacher expectancy effects on pupil IQ as a function of the credibility of expectancy induction: A synthesis of findings from 18 experiments. Journal of Educational Psychology, 76(1), 85–97.