The metafor Package

A Meta-Analysis Package for R

User Tools

Site Tools


Frequently Asked Questions

General Questions

The name 'metafor' stands for 'META-analysis FOr R' (so the package name is not 'metaphor', even if your spellchecker insists on that spelling ...).

Based on some code I wrote as part of my dissertation research, I developed a function called mima() which provided the basic functionality for fitting equal/fixed- and random/mixed-effects (meta-regression) models. Around 2005-2006, I placed the function on my website (together with a short tutorial) and it was picked up by several researchers who used the function in their meta-analyses. However, while the mima() function provided the basic functionality for fitting standard meta-analytic models and conducting meta-regression analyses, the metafor package was written in response to several requests to expand the function into a full package for conducting meta-analyses with additional options and support functions. The mima() function is therefore now obsolete and has been removed from my website.

Various attempts have been made to validate the functions in the metafor package. First of all, when corresponding analyses could be carried out, I have compared the results provided by the metafor package with those obtained with other software packages for several data sets. In particular, results have been compared with those provided by the metan, metareg, metabias, and metatrim commands in Stata (for more details on these commands, see Sterne, 2009). Results were also compared with those provided by SAS using the proc mixed command (for more details, see van Houwelingen, Arends, & Stijnen, 2002), by SPSS using the macros developed by David Wilson (Lipsey & Wilson, 2001), by the meta (CRAN Link) and rmeta (CRAN Link) packages in R, and by Comprehensive Meta-Analysis, MetaWin, and the Review Manager of the Cochrane Collaboration. Results either agreed completely or fell within a margin of error expected when using numerical methods.

Second, results provided by the metafor package have been compared with published results described in articles and books (the assumption being that those results are in fact correct). On this website, I provide a number of such analysis examples that you can examine yourself. All of these examples (and some more) are also encapsulated in automated tests using the testthat package, so that any changes to the code that would lead to these examples becoming non-reproducible are automatically detected.

Third, I have conducted extensive simulation studies for many of the methods implemented in the package to ensure that their statistical properties are as one would expect based on the underlying theory. To give a simple example, under the assumptions of an equal-effects model (i.e., homogeneous true effects, normally distributed effect size estimates, known sampling variances), the empirical rejection rate of $\mbox{H}_0{:}\; \theta = 0$ must be nominal (within the margin of error one would expect when randomly simulating such data). This is in fact the case, providing support that the rma() function is working appropriately for this scenario. If you want to confirm this yourself, here is some basic code for a mini Monte-Carlo simulation study, demonstrating that the rejection rate is indeed equal to $\alpha = .05$ for this scenario:

k <- 5                                      # number of studies to simulate in each iteration
pval <- rep(NA, 10000)                      # set up vector to store p-values in
for (i in 1:10000){
   vi <- runif(k, .01, 1)                   # simulate sampling variances
   yi <- rnorm(k, 0, sqrt(vi))              # simulate effect sizes with heteroscedastic sampling variances
   pval[i] <- rma(yi, vi, method="EE")$pval # fit model and store p-value
mean(pval <= .05)                           # compute empirical rejection rate (should be ~0.05)

Similar (and much more thorough/extensive) tests have been conducted for the more intricate methods in the package.

It may also be useful to note that there is now an appreciable user base of the metafor package. The Viechtbauer (2010) article describing the package has been cited over 10,000 times. Many of the citations are from applied meta-analyses and/or methodological/statistical papers that have used the metafor package as part of their research. This increases the chances that any bugs would be detected, reported, and corrected.

Finally, I have become very proficient at hitting the Ballmer Peak.

For the most part, the development of the package has been funded through my own precious time. Through some collaborative work on the 'Open Meta-Analyst' software from the Center for Evidence-Based Medicine at Brown University, I have received some funding as part of a subcontract on a grant. Also, Sandra Wilson and Mark Lipsey of the Peabody Research Institute at Vanderbilt University have provided funding that gave me some time and incentive for making the more efficient and for adding multicore capabilities to the profile() functions. In this context, I also would like to thank Jason Hoeksema from the Department of Biology at the University of Mississippi, who invited me to be part of a working group at the National Evolutionary Synthesis Center (NESCent), which gave me time over the course of several meetings to really develop the capabilities of the function. This collaboration also helped me better appreciate how important it is to listen to the users' needs when prioritizing additions to the package. However, further developments of the package could proceed much quicker if additional funding was available. If you are aware of any funding possibilities, feel free to let me know!

First of all, thanks for trying to do so in the first place. The best way of citing the package is to cite the following paper:

Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48.

By the way, try citation("metafor") in R (this is not a command specific to the metafor package; you can try this with other package names; and citation() – without a package name between the parentheses – will tell you how to cite R itself!).

There are actually many R packages available for conducting meta-analyses. To get an appreciation for what the "meta-analysis package ecosystem" currently looks like, take a look at the Task View for Meta-Analysis, which provides a pretty thorough overview of the different packages and their capabilities.

First of all, meta-analytic models (as can be fitted with the rma.uni() and functions) make different assumptions about the nature of the sampling variances (that indicate the (im)precision of the estimates) compared to models fitted by the lm(), lme(), and lmer() functions, which assume that the sampling variances are known only up to a proportionality constant (when using their weights arguments). Extra steps must therefore be taken to fix up the output to bring the results in line with standard meta-analytic practices. For more details, I have written up a more comprehensive comparison of the rma() and the lm(), lme(), and lmer() functions.

Furthermore, there are various additional statistics, analyses, and graphical outputs that typically need to be obtained/reported when conducting a meta-analysis. Therefore, while you could use functions such as lm(), lme(), and lmer() (possibly with some tweaking) to fit certain types of meta-analytic models, a package such as metafor gives you an integrated toolbox that should cover the entire meta-analytic workflow.

Technical Questions

For random-effects models, the $I^2$ statistic is computed with $$I^2 = 100\% \times \frac{\hat{\tau}^2}{\hat{\tau}^2 + \tilde{v}},$$ where $\hat{\tau}^2$ is the estimated value of $\tau^2$ and $$\tilde{v} = \frac{(k-1) \sum w_i}{(\sum w_i)^2 - \sum w_i^2},$$ where $w_i = 1 / v_i$ is the inverse of the sampling variance of the $i$th study ($\tilde{v}$ is equation 9 in Higgins & Thompson, 2002, and can be regarded as the 'typical' within-study variance of the observed effect sizes or outcomes). The $H^2$ statistic is computed with $$H^2 = \frac{\hat{\tau}^2 + \tilde{v}}{\tilde{v}}.$$ Analogous equations are used for mixed-effects models.

Therefore, depending on the estimator of $\tau^2$ used, the values of $I^2$ and $H^2$ will change. For random-effects models, $I^2$ and $H^2$ are often computed in practice with $I^2 = 100\% \times (Q-(k-1))/Q$ and $H^2 = Q/(k-1)$, where $Q$ denotes the statistic for the test of heterogeneity and $k$ the number of studies (i.e., observed effects or outcomes) included in the meta-analysis. The equations used in the metafor package to compute these statistics are based on more general definitions and have the advantage that the values of $I^2$ and $H^2$ will be consistent with the estimated value of $\tau^2$ (i.e., if $\hat{\tau}^2 = 0$, then $I^2 = 0$ and $H^2 = 1$ and if $\hat{\tau}^2 > 0$, then $I^2 > 0$ and $H^2 > 1$).

These two sets of equations for $I^2$ and $H^2$ actually coincide when using the DerSimonian-Laird estimator of $\tau^2$ (i.e., the commonly used equations are actually special cases of the more general definitions given above). Therefore, if you prefer the more conventional definitions of these statistics, use method="DL" when fitting the random/mixed-effects model with the rma() function.

See the analysis example for Raudenbush (2009) for an example of this.

The pseudo $R^2$ statistic (Raudenbush, 2009) is computed with $$R^2 = \frac{\hat{\tau}_{RE}^2 - \hat{\tau}_{ME}^2}{\hat{\tau}_{RE}^2} = 1 - \frac{\hat{\tau}_{ME}^2}{\hat{\tau}_{RE}^2},$$ where $\hat{\tau}_{RE}^2$ denotes the estimated value of $\tau^2$ based on the random-effects model (i.e., the total amount of heterogeneity) and $\hat{\tau}_{ME}^2$ denotes the estimated value of $\tau^2$ based on the mixed-effects model (i.e., the residual amount of heterogeneity). It can happen that $\hat{\tau}_{RE}^2 < \hat{\tau}_{ME}^2$, in which case $R^2$ is set to zero. Again, the value of $R^2$ will change depending on the estimator of $\tau^2$ used. Also note that this statistic is only computed when the mixed-effects model includes an intercept (so that the random-effects model is clearly nested within the mixed-effects model). You can also use the anova() function to compute $R^2$ for any two models that are known to be nested.

By default, the interval is computed with $$\hat{\mu} \pm z_{1-\alpha/2} \sqrt{\mbox{SE}[\hat{\mu}]^2 + \hat{\tau}^2},$$ where $\hat{\mu}$ is the estimated average true outcome, $z_{1-\alpha/2}$ is the $100 \times (1-\alpha/2)$th percentile of a standard normal distribution (e.g., $1.96$ for $\alpha = .05$), $\mbox{SE}[\hat{\mu}]$ is the standard error of $\hat{\mu}$, and $\hat{\tau}^2$ is the estimated amount of heterogeneity (i.e., the variance in the true outcomes across studies). If the model was fitted with the Knapp and Hartung (2003) method (i.e., with test="knha" in rma()), then instead of $z_{1-\alpha/2}$, the $100 \times (1-\alpha/2)$th percentile of a t-distribution with $k-1$ degrees of freedom is used.

Note that this differs slightly from Riley et al. (2011), who suggest to use a t-distribution with $k-2$ degrees of freedom for constructing the interval. Neither a normal, nor a t-distribution with $k-1$ or $k-2$ degrees of freedom is correct; all of these are approximations. The computations in metafor are done in the way described above, so that the prediction interval is identical to the confidence interval for $\mu$ when $\hat{\tau}^2 = 0$, which could be argued is the logical thing that should happen. If the prediction interval should be computed exactly as described by Riley et al. (2011), one can use argument pi.type="riley" in predict().

The escalc() function offers the possibility to transform raw proportions and incidence rates with the Freeman-Tukey transformation (Freeman & Tukey, 1950). For proportions, this is also sometimes called the 'Freeman-Tukey double arcsine transformation'.

For proportions, the transformation (measure="PFT") is computed with the equation $y_i = 1/2 \times (\mbox{asin}(\sqrt{x_i/(n_i+1)}) + \mbox{asin}(\sqrt{(x_i+1)/(n_i+1)}))$, where $x_i$ denotes the number of individuals experiencing the event of interest and $n_i$ denotes the total number of individuals (i.e., sample size). The variance of $y_i$ is then computed with $v_i = 1/(4n_i + 2)$.

For incidence rates, the transformation (measure="IRFT") is computed with the equation $y_i = 1/2 \times (\sqrt{x_i/t_i} + \sqrt{(x_i+1)/t_i})$, where $x_i$ denotes the total number of events that occurred and $t_i$ denotes the total person-time at risk. The variance of $y_i$ is then computed with $v_i = 1/(4t_i)$.

One can also find definitions of these transformations without the multiplicative constant $1/2$ (the equations for the variance should then be multiplied by $4$). Since the $1/2$ is just a constant, it does not matter which definition one uses (as long as one uses the correct equation for the sampling variance). The metafor package uses the definitions given above, so that values obtained from the arcsine square-root (angular) transformation (measure="PAS") and from the Freeman-Tukey double arcsine transformation (measure="PFT") are approximately of the same magnitude (without the $1/2$ multiplier, PFT values would be about twice as large). The same applies to square-root transformed incidence rates (measure="IRS") and Freeman-Tukey transformed rates (measure="IRFT").

When used with the default settings, the function in metafor can indeed provide results that differ from those obtained with other meta-analytic software, such as the metan function in Stata, the Review Manager (RevMan) from the Cochrane Collaboration, or Comprehensive Meta-Analysis (CMA). By default, metafor does not apply any adjustments to the cell counts in studies with zero cases in either group when applying the Mantel-Haenszel method, while other software may do this automatically. For more details, take a look at the comparison of the Mantel-Haenszel method in different software and what settings to use to make metafor provide the exact same results as other software.


Freeman, M. F., & Tukey, J. W. (1950). Transformations related to the angular and the square root. Annals of Mathematical Statistics, 21(4), 607–611.

Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21(11), 1539–1558.

van Houwelingen, H. C., Arends, L. R., & Stijnen, T. (2002). Advanced methods in meta-analysis: Multivariate approach and meta-regression. Statistics in Medicine, 21(4), 589–624.

Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-Analysis. Sage, Thousand Oaks, CA.

Raudenbush, S. W. (2009). Analyzing effect sizes: Random effects models. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (2nd ed., pp. 295–315). New York: Russell Sage Foundation.

Riley, R. D., Higgins, J. P. T. & Deeks, J. J. (2011). Interpretation of random effects meta-analyses. British Medical Journal, 342, d549.

Sterne, J. A. C. (Ed.) (2009). Meta-analysis in Stata: An updated collection from the Stata Journal. Stata Press, College Station, TX.

faq.txt · Last modified: 2024/06/18 19:26 by Wolfgang Viechtbauer