Table of Contents
Comparing Estimates of Independent Meta-Analyses or Subgroups
Suppose we have summary estimates (e.g., estimated average effects) obtained from two independent meta-analyses or two subgroups of studies within the same meta-analysis and we want to test whether the estimates are different from each other. A Wald-type test can be used for this purpose. Alternatively, one could run a single meta-regression model including all studies and using a dichotomous moderator to distinguish the two sets. Both approaches are conceptually very similar with a subtle difference that will be illustrated below with an example.
Data Preparation
We will use the 'famous' BCG vaccine meta-analysis for this illustration. First, we compute the log risk ratios (and corresponding sampling variances) for each study and then dichotomize the alloc
variable.
library(metafor) dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg) dat$alloc <- ifelse(dat$alloc == "random", "random", "other") dat
trial author year tpos tneg cpos cneg ablat alloc yi vi 1 1 Aronson 1948 4 119 11 128 44 random -0.8893 0.3256 2 2 Ferguson & Simes 1949 6 300 29 274 55 random -1.5854 0.1946 3 3 Rosenthal et al 1960 3 228 11 209 42 random -1.3481 0.4154 4 4 Hart & Sutherland 1977 62 13536 248 12619 52 random -1.4416 0.0200 5 5 Frimodt-Moller et al 1973 33 5036 47 5761 13 other -0.2175 0.0512 6 6 Stein & Aronson 1953 180 1361 372 1079 44 other -0.7861 0.0069 7 7 Vandiviere et al 1973 8 2537 10 619 19 random -1.6209 0.2230 8 8 TPT Madras 1980 505 87886 499 87892 13 random 0.0120 0.0040 9 9 Coetzee & Berjak 1968 29 7470 45 7232 27 random -0.4694 0.0564 10 10 Rosenthal et al 1961 17 1699 65 1600 42 other -1.3713 0.0730 11 11 Comstock et al 1974 186 50448 141 27197 18 other -0.3394 0.0124 12 12 Comstock & Webster 1969 5 2493 3 2338 33 other 0.4459 0.5325 13 13 Comstock et al 1976 27 16886 29 17825 33 other -0.0173 0.0714
Separate Meta-Analyses
First, we fit two separate random-effects models within each subset defined by the alloc
variable:
res1 <- rma(yi, vi, data=dat, subset=alloc=="random") res2 <- rma(yi, vi, data=dat, subset=alloc=="other")
We then combine the estimates and standard errors from each model into a data frame. We also add a variable to distinguish the two models and, for reasons to be explained in more detail below, we add the estimated amounts of heterogeneity within each subset to the data frame.
dat.comp <- data.frame(alloc = c("random", "other"), estimate = c(coef(res1), coef(res2)), stderror = c(se(res1), se(res2)), tau2 = c(res1$tau2, res2$tau2)) dfround(dat.comp, 3)
alloc estimate stderror tau2 1 random -0.971 0.276 0.393 2 other -0.481 0.217 0.212
We can now compare the two estimates (i.e., the estimated average log risk ratios) by feeding them back to the rma()
function and using the variable to distinguish the two estimates as a moderator. We use a fixed-effects meta-regression model for this purpose, because the (residual) heterogeneity within each subset has already been accounted for by fitting random-effects models above.
rma(estimate, sei=stderror, mods = ~ alloc, method="FE", data=dat.comp, digits=3)
Fixed-Effects with Moderators Model (k = 2) Test for Residual Heterogeneity: QE(df = 0) = 0.000, p-val = 1.000 Test of Moderators (coefficient(s) 2): QM(df = 1) = 1.946, p-val = 0.163 Model Results: estimate se zval pval ci.lb ci.ub intrcpt -0.481 0.217 -2.218 0.027 -0.907 -0.056 * allocrandom -0.490 0.351 -1.395 0.163 -1.178 0.198 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
While we find that studies using random assignment obtain on average larger (i.e., more negative) effects than studies not using random assignment ($b_1 = -0.490$, $SE = 0.351$), the difference between the two estimates is not statistically significant ($z = -1.395$, $p = .163$).
The test of the difference between the two estimates is really just a Wald-type test, given by the equation $$z = \frac{\hat{\mu}_1 - \hat{\mu}_2}{\sqrt{SE[\hat{\mu}_1]^2 + SE[\hat{\mu}_2]^2}},$$ where $\hat{\mu}_1$ and $\hat{\mu}_2$ are the two estimates and $SE[\hat{\mu}_1]$ and $SE[\hat{\mu}_2]$ the corresponding standard errors. The test statistics can therefore also be computed with:
with(dat.comp, round(c(zval = (estimate[1] - estimate[2])/sqrt(stderror[1]^2 + stderror[2]^2)), 3))
zval -1.395
This is the same value that we obtained above.
Meta-Regression with All Studies
Now let's take a different approach, fitting a meta-regression model with alloc
as a categorical moderator based on all studies:
rma(yi, vi, mods = ~ alloc, data=dat, digits=3)
Mixed-Effects Model (k = 13; tau^2 estimator: REML) tau^2 (estimated amount of residual heterogeneity): 0.318 (SE = 0.178) tau (square root of estimated tau^2 value): 0.564 I^2 (residual heterogeneity / unaccounted variability): 89.92% H^2 (unaccounted variability / sampling variability): 9.92 R^2 (amount of heterogeneity accounted for): 0.00% Test for Residual Heterogeneity: QE(df = 11) = 138.511, p-val < .001 Test of Moderators (coefficient(s) 2): QM(df = 1) = 1.833, p-val = 0.176 Model Results: estimate se zval pval ci.lb ci.ub intrcpt -0.467 0.257 -1.816 0.069 -0.972 0.037 . allocrandom -0.490 0.362 -1.354 0.176 -1.199 0.219 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The result is very similar to what we saw earlier: The coefficient corresponding to the alloc
dummy is equal to $b_1 = -0.490$ ($SE = 0.362$) and not significant ($p = .176$).
However, the results are not exactly identical. The reason for this is as follows. When we fit separate random-effects models in the two subsets, we are allowing the amount of heterogeneity within each set to be different (as shown earlier, the estimates were $\hat{\tau}^2 = 0.393$ and $\hat{\tau}^2 = 0.212$ for studies using and not using random assignment, respectively). On the other hand, the mixed-effects meta-regression model fitted above has a single variance component for the amount of residual heterogeneity, which implies that the amount of heterogeneity within each subset is assumed to be the same ($\hat{\tau}^2 = 0.318$ in this example – note that this falls somewhere between the two $\hat{\tau}^2$ values we obtained within the subsets).
Meta-Regression with All Studies but Different Amounts of (Residual) Heterogeneity
Using the rma.mv()
function, we can easily fit a meta-regression model using all studies where we allow the amount of residual heterogeneity to be different in each subset:
rma.mv(yi, vi, mods = ~ alloc, random = ~ alloc | trial, struct="DIAG", data=dat, digits=3)
Multivariate Meta-Analysis Model (k = 13; method: REML) Variance Components: outer factor: trial (nlvls = 13) inner factor: alloc (nlvls = 2) estim sqrt k.lvl fixed level tau^2.1 0.212 0.460 6 no other tau^2.2 0.393 0.627 7 no random Test for Residual Heterogeneity: QE(df = 11) = 138.511, p-val < .001 Test of Moderators (coefficient(s) 2): QM(df = 1) = 1.946, p-val = 0.163 Model Results: estimate se zval pval ci.lb ci.ub intrcpt -0.481 0.217 -2.218 0.027 -0.907 -0.056 * allocrandom -0.490 0.351 -1.395 0.163 -1.178 0.198 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Note that the two estimates of $\tau^2$ are now identical to the ones we obtained earlier from the separate random-effects models. Also, the coefficient, standard error, and p-value for the moderator now matches the results obtained earlier.1)
A discussion/comparison of these two approaches (i.e., assuming a single $\tau^2$ value or allowing $\tau^2$ to differ across subsets) can be found in the following articles:
Rubio-Aparicio, M., Sánchez-Meca, J., López-López, J. A., Botella, J. & Marín-Martínez, F. (2017). Analysis of categorical moderators in mixed-effects meta-analysis: Consequences of using pooled versus separate estimates of the residual between-studies variances. British Journal of Mathematical and Statistical Psychology, 70(3), 439-456. https://doi.org/https://doi.org/10.1111/bmsp.12092
Rubio-Aparicio, M., López-López, J. A., Viechtbauer, W., Marín-Martínez, F., Botella, J., & Sánchez-Meca, J. (2020). Testing categorical moderators in mixed-effects meta-analysis in the presence of heteroscedasticity. Journal of Experimental Education, 88(2), 288-310. https://doi.org/10.1080/00220973.2018.1561404
We can also conduct a likelihood ratio test (LRT) to examine whether there are significant differences in the $\tau^2$ values across subsets. This can be done with:
res1 <- rma.mv(yi, vi, mods = ~ alloc, random = ~ alloc | trial, struct="DIAG", data=dat) res0 <- rma.mv(yi, vi, mods = ~ alloc, random = ~ alloc | trial, struct="ID", data=dat) anova(res1, res0)
df AIC BIC AICc logLik LRT pval QE Full 4 29.2959 30.8875 35.9626 -10.6480 138.5113 Reduced 3 27.5948 28.7885 31.0234 -10.7974 0.2989 0.5845 138.5113
So in this example, we would not reject the null hypothesis $\mbox{H}_0{:}\; \tau^2_1 = \tau^2_2$ ($p = .58$).
Other Types of Models
The issue discussed above also arises for other types of models (e.g., multilevel meta-analytic models). When fitting a particular model within several subgroups, then the variance components of the model are automatically allowed to differ across the subgroups. On the other hand, when fitting the same type of model to all studies combined (but including a moderator to allow the mean effect size to differ across subgroups), then the variance components are assumed to be the same within each subgroups (unless one takes extra steps as illustrated above to allow the variance components to differ across subgroups). Consequently, the results obtained with these two approaches will not be identical.