The metafor Package

A Meta-Analysis Package for R

User Tools

Site Tools


$\boldsymbol{I^2}$ for Multilevel and Multivariate Models

The $I^2$ statistic was introduced by Higgins and Thompson in their seminal 2002 paper and has become a rather popular statistic to report in meta-analyses, as it facilitates the interpretation of the amount of heterogeneity present in a given dataset.

For a standard random-effects models, the $I^2$ statistic is computed with $$I^2 = 100\% \times \frac{\hat{\tau}^2}{\hat{\tau}^2 + \tilde{v}},$$ where $\hat{\tau}^2$ is the estimated value of $\tau^2$ and $$\tilde{v} = \frac{(k-1) \sum w_i}{(\sum w_i)^2 - \sum w_i^2},$$ where $w_i = 1/v_i$ is the inverse of the sampling variance of the $i^{th}$ study. The equation for $\tilde{v}$ is equation 9 in Higgins & Thompson (2002) and can be regarded as the 'typical' within-study (or sampling) variance of the observed effect sizes or outcomes.1)

Sidenote: As the equation above shows, $I^2$ estimates the amount of heterogeneity relative to the total amount of variance in the observed effects or outcomes (which is composed of the variance in the true effects, that is, $\hat{\tau}^2$, plus sampling variance, that is, $\tilde{v}$). Therefore, it is not an absolute measure of heterogeneity and should not be interpreted as such. For example, a practically/clinically irrelevant amount of heterogeneity (i.e., variance in the true effects) could lead to a large $I^2$ value if all of the studies are very large (in which case $\tilde{v}$ will be small). Conversely, when all of the studies are small (in which case $\tilde{v}$ will be large), $I^2$ may still be small, even if there are large differences in the size of the true effects. See also chapter 16 in Borenstein et al. (2009), which discusses this idea very nicely.

However, this caveat aside, $I^2$ is a very useful measure because it directly indicates to what extent heterogeneity contributes to the total variance. In addition, most people find $I^2$ easier to interpret than estimates of $\tau^2$.

Standard Random-Effects Model

Let's try out the computation for a standard random-effects model (see Berkey et al. (1995) and help( for more details on the dataset used). First, we use the rma() function for this:

dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg,
res <- rma(yi, vi, data=dat)
[1] 92.22139

So, we estimate that roughly 92% of the total variance is due to heterogeneity (i.e., variance in the true effects), while the remaining 8% can be attributed to sampling variance.

Manually computing $I^2$ as described above yields the same result:

k <- res$k
wi <- 1/res$vi
vt <- (k-1) * sum(wi) / (sum(wi)^2 - sum(wi^2))
100 * res$tau2 / (res$tau2 + vt)
[1] 92.22139

General Equation for $\boldsymbol{I^2}$

Before we continue with more complex models, it is useful to point out a more general equation for computing $I^2,$ which also applies to models involving moderator variables (i.e., mixed-effects meta-regression models). This will also become important when dealing with models where sampling errors are no longer independent. So, let us define $$\mathbf{P} = \mathbf{W} - \mathbf{W} \mathbf{X} (\mathbf{X}' \mathbf{W} \mathbf{X})^{-1} \mathbf{X}' \mathbf{W},$$ where $\mathbf{W}$ is (for now) a diagonal matrix with the inverse sampling variances (i.e., $1/v_i$) along the diagonal and $\mathbf{X}$ is the model matrix. In the random-effects model, $\mathbf{X}$ is just a column vector with 1's, but in meta-regression models, it will contain additional columns with the values of the moderator variables. Then we define $$I^2 = 100\% \times \frac{\hat{\tau}^2}{\hat{\tau}^2 + \frac{k-p}{\mathrm{tr}[\mathbf{P}]}},$$ where $\mathrm{tr}[\mathbf{P}]$ denotes the trace of the $\mathbf{P}$ matrix (i.e., the sum of the diagonal elements) and $p$ denotes the number of columns in $\mathbf{X}$.

Let's try this out for the example above:

W <- diag(1/res$vi)
X <- model.matrix(res)
P <- W - W %*% X %*% solve(t(X) %*% W %*% X) %*% t(X) %*% W
100 * res$tau2 / (res$tau2 + (res$k-res$p)/sum(diag(P)))
[1] 92.22139

For a model with moderators, this is also how rma() computes $I^2$:

res <- rma(yi, vi, mods = ~ ablat, data=dat)
[1] 68.39313
X <- model.matrix(res)
P <- W - W %*% X %*% solve(t(X) %*% W %*% X) %*% t(X) %*% W
100 * res$tau2 / (res$tau2 + (res$k-res$p)/sum(diag(P)))
[1] 68.39313

(although instead of using solve(), which can be numerically unstable in some cases, rma() uses the QR decomposition to obtain the $(\mathbf{X}' \mathbf{W} \mathbf{X})^{-1}$ part).

In models with moderators, the $I^2$ statistic indicates how much of the unaccounted variance in the observed effects or outcomes (which is composed of unaccounted variance in the true effects, that is, residual heterogeneity, plus sampling variance) can be attributed to residual heterogeneity. Here, we estimate that roughly 68% of the unaccounted variance is due to residual heterogeneity.

Multilevel Models

Multilevel structures arise when the estimates can be grouped together based on some higher-level clustering variable (e.g., paper, lab or research group, species). In that case, true effects belonging to the same group may be more similar to each other than true effects for different groups. Meta-analytic multilevel models can be used to account for the between- and within-cluster heterogeneity and hence the intracluster (or intraclass) correlation in the true effects. See Konstantopoulos (2011) for a detailed illustration of such a model.

In fact, let's use the same example here. First, we can fit the multilevel random-effects model with:

dat <- dat.konstantopoulos2011
res <-, vi, random = ~ 1 | district/school, data=dat)
Multivariate Meta-Analysis Model (k = 56; method: REML)
Variance Components:
            estim    sqrt  nlvls  fixed           factor
sigma^2.1  0.0651  0.2551     11     no         district
sigma^2.2  0.0327  0.1809     56     no  district/school
Test for Heterogeneity:
Q(df = 55) = 578.8640, p-val < .0001
Model Results:
estimate       se     zval     pval    ci.ub
  0.1847   0.0846   2.1845   0.0289   0.0190   0.3504        *
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Note that the model contains two variance components ($\sigma^2_1$ and $\sigma^2_2$), for the between-cluster (district) heterogeneity and the within-cluster (school within district) heterogeneity.

Based on the discussion above, it is now very easy to generalize the concept of $I^2$ to such a model (see also Nakagawa & Santos, 2012). That is, we can first compute:

W <- diag(1/res$vi)
X <- model.matrix(res)
P <- W - W %*% X %*% solve(t(X) %*% W %*% X) %*% t(X) %*% W
100 * sum(res$sigma2) / (sum(res$sigma2) + (res$k-res$p)/sum(diag(P)))
[1] 95.18731

Note that we have summed up the two variance components in the numerator and denominator. Therefore, this statistic can be thought of as the overall $I^2$ value that indicates how much of the total variance can be attributed to the total amount of heterogeneity (which is the sum of between- and within-cluster heterogeneity). In this case, the value is again very large, with approximately 95% of the total variance due to heterogeneity.

However, we can also break things down to estimate how much of the total variance can be attributed to between- and within-cluster heterogeneity separately:

100 * res$sigma2 / (sum(res$sigma2) + (res$k-res$p)/sum(diag(P)))
[1] 63.32484 31.86248

Therefore, about 63% of the total variance is estimated to be due to between-cluster heterogeneity, 32% due to within-cluster heterogeneity, and the remaining 5% due to sampling variance.

Multivariate Models

Now we will consider the same type of generalization, but for a multivariate model with non-independent sampling errors. Therefore, not only do we need to account for heterogeneity and dependency in the underlying true effects, but we also now need to specify covariances between the sampling errors. For an illustration of such a model, see Berkey et al. (1998), which we can also use for illustration purposes here.

dat <- dat.berkey1998
V <- lapply(split(dat[,c("v1i", "v2i")], dat$trial), as.matrix)
V <- bldiag(V)
res <-, V, mods = ~ 0 + outcome, random = ~ outcome | trial, struct="UN", data=dat)
Multivariate Meta-Analysis Model (k = 10; method: REML)
Variance Components:
outer factor: trial   (nlvls = 5)
inner factor: outcome (nlvls = 2)
            estim    sqrt  k.lvl  fixed  level
tau^2.1    0.0327  0.1807      5     no     AL
tau^2.2    0.0117  0.1083      5     no     PD
    rho.AL  rho.PD    AL  PD
AL       1  0.6088     -  no
PD  0.6088       1     5   -
Test for Residual Heterogeneity:
QE(df = 8) = 128.2267, p-val < .0001
Test of Moderators (coefficient(s) 1,2):
QM(df = 2) = 108.8616, p-val < .0001
Model Results:
           estimate      se     zval    pval    ci.ub
outcomeAL   -0.3392  0.0879  -3.8589  0.0001  -0.5115  -0.1669  ***
outcomePD    0.3534  0.0588   6.0057  <.0001   0.2381   0.4688  ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Two things are worth noting here. First of all, we allow the amount of heterogeneity to differ for the two outcomes (AL = attachment level; PD = probing depth) by using an unstructured variance-covariance matrix for the true effects (i.e., struct="UN"). Second, V is the variance-covariance matrix of the sampling errors, which is no longer diagonal. The $\mathbf{W}$ matrix described earlier is actually the inverse of the variance-covariance matrix of the sampling errors, so in general, we should write $\mathbf{W} = \mathbf{V}^{-1}$.

Therefore, a possible generalization of $I^2$ to this model is:

W <- solve(V)
X <- model.matrix(res)
P <- W - W %*% X %*% solve(t(X) %*% W %*% X) %*% t(X) %*% W
100 * res$tau2 / (res$tau2 + (res$k-res$p)/sum(diag(P)))
[1] 93.07407 82.84449

Hence, about 93% of the total (unaccounted for) variance is due to heterogeneity in the true effects for outcome AL and about 83% due to heterogeneity in the true effects for outcome PD.

The approach above computes the 'typical' sampling variance based on all studies for both $I^2$ values. However, we may want to compute two separate values of the the 'typical' sampling variance, one for each outcome. Doing so leads to these two $I^2$ values:

c(100 * res$tau2[1] / (res$tau2[1] + (sum(dat$outcome == "AL")-1)/sum(diag(P)[dat$outcome == "AL"])),
  100 * res$tau2[2] / (res$tau2[2] + (sum(dat$outcome == "PD")-1)/sum(diag(P)[dat$outcome == "PD"])))
[1] 94.8571 75.1876

Not much of a difference, but if sampling variances had been very dissimilar for the two outcomes, then this could make more of a difference.

Jackson et al. (2012) Approach

For multivariate models, Jackson et al. (2012) describe a different approach for computing $I^2$-type statistics that is based on the variance-covariance matrix of the fixed effects under the model with random effects and the model without. So, we fit these two models:

res.R <-, V, mods = ~ 0 + outcome, random = ~ outcome | trial, struct="UN", data=dat)
res.F <-, V, mods = ~ 0 + outcome, data=dat)

Then $I^2$-type statistics for the two outcomes can be computed with:

c(100 * (vcov(res.R)[1,1] - vcov(res.F)[1,1]) / vcov(res.R)[1,1],
  100 * (vcov(res.R)[2,2] - vcov(res.F)[2,2]) / vcov(res.R)[2,2])
[1] 95.49916 76.42214

These values are very similar to the ones obtained above when computing separate values for the 'typical' sampling variance for the two outcomes. For more details on this approach, see Jackson et al. (2012).


Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. Chichester, UK: Wiley.

Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21(11), 1539–1558.

Jackson, D., White, I. R., & Riley, R. D. (2012). Quantifying the impact of between-study heterogeneity in multivariate meta-analyses. Statistics in Medicine, 31(29), 3805–3820.

Konstantopoulos, S. (2011). Fixed effects and variance components estimation in three-level meta-analysis. Research Synthesis Methods, 2(1), 61–76.

Nakagawa, S., & Santos, E. S. A. (2012). Methodological issues and advances in biological meta-analysis. Evolutionary Ecology, 26(5), 1253–1274.

Takkouche, B., Cadarso-Suárez, C., & Spiegelman, D. (1999). Evaluation of old and new tests of heterogeneity in epidemiologic meta-analysis. American Journal of Epidemiology, 150(2), 206–215.

Takkouche, B., Khudyakov, P., Costa-Bouzas, J., & Spiegelman, D. (2013). Confidence intervals for heterogeneity measures in meta-analysis. American Journal of Epidemiology, 178(6), 993-1004.

There are also other suggestions in the literature for defining the 'typical' or 'average' sampling variance of the estimates. For example, Takkouche et al. (1999, 2013) suggest to use the harmonic mean of the sampling variances in an analogous computation of an $I^2$-type statistic, but this never really caught on in practice.
tips/i2_multilevel_multivariate.txt · Last modified: 2024/06/09 16:23 by Wolfgang Viechtbauer