The metafor Package

Increasing Value of $\tau^2$ When Adding Moderators

In the meta-analytic random-effects model, the parameter $\tau^2$ denotes the amount of heterogeneity (also called 'between-study variance'), that is, the variability in the underlying true effects or outcomes. When the data suggest that the underlying true effects or outcomes vary (i.e., the estimate of $\tau^2$ is larger than 0 and/or the Q-test is significant), a common goal is to examine if this heterogeneity can be accounted for based on one or more 'moderator variables'. For this, we can fit a (mixed-effects) meta-regression model, where we examine the association between the moderator variable(s) of interest and the effect sizes / outcomes. In addition, the model again provides an estimate of $\tau^2$, which now denotes the amount of residual heterogeneity, that is, the variability in the underlying true effects or outcomes that is not accounted for by the moderator variable(s) included in the model. Intuitively, we would expect that the estimate of $\tau^2$ in the meta-regression model must be lower (or at least, no larger) than the estimate of $\tau^2$ from the random-effects model. However, this is not guaranteed. Let's examine an example to illustrate this phenomenon.

For this illustration, we will work with the data from the meta-analysis by Bangert-Drowns et al. (2004) on the effectiveness of school-based writing-to-learn interventions on academic achievement. In each of the studies included in this meta-analysis, an experimental group (i.e., a group of students that received instruction with increased emphasis on writing tasks) was compared against a control group (i.e., a group of students that received conventional instruction) with respect to some content-related measure of academic achievement (e.g., final grade, an exam/quiz/test score).

library(metafor)
dat <- dat.bangertdrowns2004
dat

 id      author year grade length .  ni    yi    vi
  1    Ashworth 1992     4     15 .  60  0.65 0.070
  2       Ayers 1993     2     10 .  34 -0.75 0.126
  3      Baisch 1990     2      2 .  95 -0.21 0.042
  4       Baker 1994     4      9 . 209 -0.04 0.019
  5      Bauman 1992     1     14 . 182  0.23 0.022
  6      Becker 1996     4      1 . 462  0.03 0.009
  7 Bell & Bell 1985     3      4 .  38  0.26 0.106
  8     Brodney 1994     1     15 . 542  0.06 0.007
  9      Burton 1986     4      4 .  99  0.06 0.040
 10   Davis, BH 1990     1      9 .  77  0.12 0.052
  .           .    .     .      . .   .     .     .
 45       Wells 1986     1      8 . 250  0.04 0.016
 46      Willey 1988     3     15 .  51  1.46 0.099
 47      Willey 1988     2     15 .  46  0.04 0.087
 48   Youngberg 1989     4     15 .  56  0.25 0.072

Variable yi provides the standardized mean differences (with positive values indicating a higher mean level of academic achievement in the intervention group), while variable vi provides the corresponding sampling variances.

For this illustration, we will remove two studies that have missing values on one of the moderators we will examine further below (so that all analyses are based on the exact same data, as otherwise differences may also arise due to the inclusion of different subsets of the data in the analyses).

dat <- dat[!is.na(dat$length),]

Now, let's fit a random-effects model to these data.

res <- rma(yi, vi, data=dat)
res

Random-Effects Model (k = 46; tau^2 estimator: REML)
 
tau^2 (estimated amount of total heterogeneity): 0.0465 (SE = 0.0191)
tau (square root of estimated tau^2 value):      0.2156
I^2 (total heterogeneity / total variability):   57.09%
H^2 (total variability / sampling variability):  2.33
 
Test for Heterogeneity:
Q(df = 45) = 99.9681, p-val < .0001
 
Model Results:
 
estimate      se    zval    pval   ci.lb   ci.ub
  0.2120  0.0459  4.6172  <.0001  0.1220  0.3020  ***
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results suggest that groups receiving a writing-to-learn intervention perform on average better than those in the control condition (the estimated average standardized mean difference of 0.21 is significantly different from 0). However, the Q-test is clearly significant (p < .0001) and the estimate of $\tau^2$ is larger than 0 ($\hat{\tau}^2 = 0.0465$). Therefore, it appears that the effectiveness of writing-to-learn interventions differs across studies.

We can include one or more moderators in the model via the mods argument. To start, let's examine if studies with a longer treatment length (in weeks) tend to yield larger effects than those with a shorter treatment length.

res <- rma(yi, vi, mods = ~ length, data=dat)
res

Mixed-Effects Model (k = 46; tau^2 estimator: REML)
 
tau^2 (estimated amount of residual heterogeneity):     0.0441 (SE = 0.0188)
tau (square root of estimated tau^2 value):             0.2100
I^2 (residual heterogeneity / unaccounted variability): 55.26%
H^2 (unaccounted variability / sampling variability):   2.24
R^2 (amount of heterogeneity accounted for):            5.08%
 
Test for Residual Heterogeneity:
QE(df = 44) = 96.2810, p-val < .0001
 
Test of Moderators (coefficient 2):
QM(df = 1) = 4.2266, p-val = 0.0398
 
Model Results:
 
         estimate      se    zval    pval    ci.lb   ci.ub
intrcpt    0.0692  0.0825  0.8384  0.4018  -0.0925  0.2309
length     0.0149  0.0073  2.0559  0.0398   0.0007  0.0292  *
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The positive (and just significantly different from 0) slope for the length moderator provide some support for this hypothesis. Note that the estimate of $\tau^2$ from this model is now slightly lower than the one from random-effects model (by a little over 5%, as indicated by the $R^2$ value, so we estimate that about 5% of the heterogeneity is accounted for by this moderator).

Now we will examine if the the effectiveness of writing-to-learn interventions differs across grade levels (variable grade is coded as follows: 1 = elementary, 2 = middle, 3 = high-school, 4 = college; however, we will treat this variable is as a categorical moderator in this analysis by coding it as a factor).

res <- rma(yi, vi, mods = ~ factor(grade), data=dat)
res

Mixed-Effects Model (k = 46; tau^2 estimator: REML)
 
tau^2 (estimated amount of residual heterogeneity):     0.0501 (SE = 0.0210)
tau (square root of estimated tau^2 value):             0.2239
I^2 (residual heterogeneity / unaccounted variability): 57.82%
H^2 (unaccounted variability / sampling variability):   2.37
R^2 (amount of heterogeneity accounted for):            0.00%
 
Test for Residual Heterogeneity:
QE(df = 42) = 94.5390, p-val < .0001
 
Test of Moderators (coefficients 2:4):
QM(df = 3) = 6.1159, p-val = 0.1061
 
Model Results:
 
                estimate      se     zval    pval    ci.lb    ci.ub
intrcpt           0.2621  0.0876   2.9909  0.0028   0.0903   0.4338  **
factor(grade)2   -0.3682  0.1672  -2.2023  0.0276  -0.6959  -0.0405   *
factor(grade)3    0.0445  0.1380   0.3226  0.7470  -0.2260   0.3150
factor(grade)4   -0.0423  0.1141  -0.3705  0.7110  -0.2659   0.1814
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The $Q_M$-test is not quite significant (p = 0.11) and hence this analysis does not provide support for the hypothesis that there are differences in the average effectiveness of writing-to-learn interventions across the various grade levels (although the contrast between grade levels 2 and 1 is actually just significant). Note that the estimate of $\tau^2$ in this model is actually larger than the one from the random-effects model (accordingly, the value of $R^2$ is given as 0%).

In principle, this is counterintuitive, since the amount of heterogeneity that is unaccounted for in a meta-regression model cannot be larger than the total amount of heterogeneity that there is in the data to begin with. However, as we can see above, this can happen. The values of $\tau^2$ provided above are estimates and hence it can happen that the one from the meta-regression model actually exceeds the one from the random-effects model. Although rare, this can even happen if the moderator variable is significantly related to the effect sizes!

This phenomenon is well-known in the multilevel model literature in general (e.g., Snijders & Bosker, 1999; Gelman & Hill, 2007), but maybe not as well in the meta-analytic context (but see López-López et al., 2014). The little illustration above hopefully can help to raise awareness of this issue.

References

Gelman, A. & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge, UK: Cambridge University Press.

López-López, J. A., Marín-Martínez, F., Sánchez-Meca, J., Van den Noortgate, W. & Viechtbauer, W. (2014). Estimation of the predictive power of the model in mixed-effects meta-regression: A simulation study. British Journal of Mathematical and Statistical Psychology, 67(1), 30-48. https://doi.org/10.1111/bmsp.12002

Snijders, T. A. B. & Bosker, R. J. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Thousand Oaks, CA: Sage.