Table of Contents
A Comparison of the rma() and the lm(), lme(), and lmer() Functions
Commonly used meta-analytic models are just special cases of the general linear (mixed-effects) model with the only peculiar aspect that the variances of the error terms (i.e., the sampling variances) are known. Not surprisingly, the question therefore comes up occasionally why the lm()
, lme()
, and lmer()
functions cannot be used to conduct a meta-analysis. After all, the functions can be used to fit various types of linear (mixed-effects) models and both functions allow the user to specify the sampling variances via the weights
argument. So it seems that one should also be able to fit meta-analytic models with these functions. Below, I describe and illustrate how the models fitted via the lm()
, lme()
, and lmer()
functions differ from the models fitted by the rma()
function and why the those functions are therefore not suitable for fitting meta-analytic models.
Equal-Effects Model
Let's start with an equal-effects model. As an example, consider the data from the meta-analysis by Molloy et al. (2014) on the relationship between conscientiousness and medication adherence. For each study, we can compute the r-to-z transformed correlation coefficient and corresponding sampling variance with:
library(metafor) dat <- escalc(measure="ZCOR", ri=ri, ni=ni, data=dat.molloy2014) dat[,-c(5:10)]
authors year ni ri yi vi 1 Axelsson et al. 2009 109 0.187 0.1892 0.0094 2 Axelsson et al. 2011 749 0.162 0.1634 0.0013 3 Bruce et al. 2010 55 0.340 0.3541 0.0192 4 Christensen et al. 1999 107 0.320 0.3316 0.0096 5 Christensen & Smith 1995 72 0.270 0.2769 0.0145 6 Cohen et al. 2004 65 0.000 0.0000 0.0161 7 Dobbels et al. 2005 174 0.175 0.1768 0.0058 8 Ediger et al. 2007 326 0.050 0.0500 0.0031 9 Insel et al. 2006 58 0.260 0.2661 0.0182 10 Jerant et al. 2011 771 0.010 0.0100 0.0013 11 Moran et al. 1997 56 -0.090 -0.0902 0.0189 12 O'Cleirigh et al. 2007 91 0.370 0.3884 0.0114 13 Penedo et al. 2003 116 0.000 0.0000 0.0088 14 Quine et al. 2012 537 0.150 0.1511 0.0019 15 Stilley et al. 2004 158 0.240 0.2448 0.0065 16 Wiebe & Christensen 1997 65 0.040 0.0400 0.0161
The yi
values are the r-to-z transformed correlations and the vi
values the corresponding sampling variances.
We can now fit an equal-effects model to these data with:
res.ee <- rma(yi, vi, data=dat, method="EE") res.ee
Equal-Effects Model (k = 16) I^2 (total heterogeneity / total variability): 60.69% H^2 (total variability / sampling variability): 2.54 Test for Heterogeneity: Q(df = 15) = 38.1595, p-val = 0.0009 Model Results: estimate se zval pval ci.lb ci.ub 0.1252 0.0170 7.3642 <.0001 0.0919 0.1585 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now let's try to fit the same model with the lm()
function by specifying the inverse of the sampling variances as weights:
res.lm <- lm(yi ~ 1, weights = 1/vi, data=dat) summary(res.lm)
Call: lm(formula = yi ~ 1, data = dat, weights = 1/vi) Weighted Residuals: Min 1Q Median 3Q Max -3.1919 -1.0719 0.6674 1.3173 2.4695 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.12518 0.02711 4.617 0.000335 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.595 on 15 degrees of freedom
Two things are of note here:
- The estimated intercept (
0.12518
) is exactly the same as the model coefficient obtained earlier with therma()
function (the value reported by therma()
function is rounded to 4 decimals, but that can be changed withprint(res.ee, digits=5)
yielding the same value). - The standard error (of the estimated intercept) is different (and hence, the test statistic and p-value also differ).
The reason for this discrepancy is that the model fitted by the lm()
function assumes that the weights are not exactly known, but only up to a proportionality constant (namely $\sigma^2_e$, that is, the error or residual variance), which is estimated and then factored into the weights.
This can be demonstrated by extracting the estimated error variance from the lm
object, multiplying the sampling variances by that value, and re-fitting the model with the rma()
function:
rma(yi, vi*sigma(res.lm)^2, data=dat, method="EE")
Equal-Effects Model (k = 16) I^2 (total heterogeneity / total variability): 0.00% H^2 (total variability / sampling variability): 1.00 Test for Heterogeneity: Q(df = 15) = 15.0000, p-val = 0.4514 Model Results: estimate se zval pval ci.lb ci.ub 0.1252 0.0271 4.6171 <.0001 0.0720 0.1783 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now these results are exactly the same as those obtained by the lm()
function. Note that multiplying the sampling variances by some constant value does not affect the model estimate, but only the corresponding standard error.
Therefore, if we want to obtain the same standard error from lm()
as from rma()
, we must adjust the standard error by $\hat{\sigma}_e$ (i.e., the square-root of the estimated error variance or what is called the "residual standard error" in the output from the lm()
function):
coef(summary(res.ee))
estimate se zval pval ci.lb ci.ub 1 0.1251767 0.01699805 7.364176 1.782441e-13 0.09186109 0.1584922
coef(summary(res.lm))[1,2] / sigma(res.lm)
[1] 0.01699805
This is now the exact same standard error of the pooled estimate from an equal-effects model.
Random-Effects Model
A random-effects model can be fitted to the same data using the rma()
function with:
res.re <- rma(yi, vi, data=dat) res.re
Random-Effects Model (k = 16; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.0081 (SE = 0.0055) tau (square root of estimated tau^2 value): 0.0901 I^2 (total heterogeneity / total variability): 61.73% H^2 (total variability / sampling variability): 2.61 Test for Heterogeneity: Q(df = 15) = 38.1595, p-val = 0.0009 Model Results: estimate se zval pval ci.lb ci.ub 0.1499 0.0316 4.7501 <.0001 0.0881 0.2118 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Now let's try to fit the same model with the lme()
function from the nlme
package. Here, the weights
argument is used to specify a variance function with fixed variances (note: the name of the weights
argument is a bit of a misnomer for the lme()
function, as one does not use it to specify the weights – as in the lm()
function – but a variance function structure). In addition, we need to add the random effects for each study via the random
argument.
library(nlme) dat$study <- 1:nrow(dat) res.lme <- lme(yi ~ 1, random = ~ 1 | study, weights = varFixed(~ vi), data=dat) summary(res.lme)
Linear mixed-effects model fit by REML Data: dat AIC BIC logLik -8.781569 -6.657418 7.390784 Random effects: Formula: ~1 | study (Intercept) Residual StdDev: 0.06818354 1.259548 Variance function: Structure: fixed weights Formula: ~vi Fixed effects: yi ~ 1 Value Std.Error DF t-value p-value (Intercept) 0.1415557 0.03071906 16 4.608073 3e-04 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -1.1596768 -0.6903414 0.1964221 0.7117538 1.4616798 Number of Observations: 16 Number of Groups: 16
Alternatively, we could use the lmer()
function from the lme4
package. The syntax would be:
library(lme4) res.lmer <- lmer(yi ~ 1 + (1 | study), weights = 1/vi, data=dat, control=lmerControl(check.nobs.vs.nlev="ignore", check.nobs.vs.nRE="ignore")) summary(res.lmer)
Linear mixed model fit by REML ['lmerMod'] Formula: yi ~ 1 + (1 | study) Data: dat Weights: 1/vi Control: lmerControl(check.nobs.vs.nlev = "ignore", check.nobs.vs.nRE = "ignore") REML criterion at convergence: -14.8 Scaled residuals: Min 1Q Median 3Q Max -1.1597 -0.6903 0.1964 0.7117 1.4617 Random effects: Groups Name Variance Std.Dev. study (Intercept) 0.004649 0.06818 Residual 1.586461 1.25955 Number of obs: 16, groups: study, 16 Fixed effects: Estimate Std. Error t value (Intercept) 0.14156 0.03072 4.608
Note that some of the default checks need to be switched off before lmer()
will go through with fitting this model.
A couple things are of note here:
- The estimated intercept differs from the model coefficient obtained earlier with the
rma()
function. - The standard error is also different (and hence, the test statistic and p-value also differ).
- The estimated standard deviation of the random effects also differs (
0.0682
forlme()
andlmer()
compared to0.0901
forrma()
).
These discrepancies arise for the same reason described earlier. In other words, the lme()
and lmer()
functions assume that the sampling variances are not exactly known, but again just up to a proportionality constant.
To illustrate this, we can again factor in that constant into the sampling variances and refit the model with rma()
:
rma(yi, vi*sigma(res.lme)^2, data=dat)
Random-Effects Model (k = 16; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.0046 (SE = 0.0049) tau (square root of estimated tau^2 value): 0.0682 I^2 (total heterogeneity / total variability): 36.82% H^2 (total variability / sampling variability): 1.58 Test for Heterogeneity: Q(df = 15) = 24.0532, p-val = 0.0642 Model Results: estimate se zval pval ci.lb ci.ub 0.1416 0.0307 4.6081 <.0001 0.0813 0.2018 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Now these results match exactly what was obtained with the lme()
function.
In terms of the underlying models, we can also easily describe the difference as follows. The underlying model fitted by the rma()
function is: $$y_i = \mu + u_i + e_i,$$ where $u_i \sim N(0, \tau^2)$ and $e_i \sim N(0, v_i)$, where the $v_i$ values are the known sampling variances. On the other hand, the model fitted by the lme()
and lmer()
functions is: $$y_i = \mu + u_i + e_i,$$ where $u_i \sim N(0, \tau^2)$ and $e_i \sim N(0, \sigma^2_e v_i)$, where $\sigma^2_e$ is that proportionality constant.
Meta-Regression
The examples above show models without any covariates/predictors/moderators (i.e., the models only include an intercept term). However, the exact same discrepancy will be found when including covariates in the models. In this context, Thompson and Sharp (1999) also describe the difference discussed above. They denote the model fitted by lm()
as a model that accounts for potential heterogeneity by a multiplicative factor (i.e., the residual variance), while the random-effects model fitted via the rma()
function is a model that accounts for potential heterogeneity by an additive factor. They note that the "rationale for using a multiplicative factor for variance inflation is weak" (p. 2705) and clearly recommend to use a model with an additive heterogeneity component.
The possibility of using both an additive and multiplicative component simultaneously was not discussed by Thompson and Sharp (1999). This is in fact what the model fitted with the lme()
function does. However, as far as I am aware of, no motivation supporting such a model has ever been given. In fact, it is my impression that those who use the lme()
and lmer()
functions for meta-analytic purposes are typically not aware of the issues discussed above.
More Complex Models
Note that the same issue arises when fitting more complex meta-analytic models (e.g., multivariate/multilevel models). The rma.mv()
function in the metafor package can be used to fit such models (see the analysis examples section for some illustrations). Using the lme()
and lmer()
functions to fit such models will again lead to a mix of additive and multiplicative variance components, which is not how multivariate/multilevel meta-analytic models are typically defined.
S-Plus Version of lme()
The nlme
package was developed by José Pinheiro and Douglas Bates for both R and S-Plus. Interestingly, the S-Plus version has a special control
argument that allows the user to set the error variance to a fixed constant. By setting this to 1, one can fit the exact same model as the rma()
function does:
res.lme <- lme(yi ~ 1, random = ~ 1 | study, weights = varFixed(~ vi), control=lmeControl(sigma = 1), data=dat) summary(res.lme)
Linear mixed-effects model fit by REML Data: dat AIC BIC logLik -10.44653 -9.03043 7.223265 Random effects: Formula: ~ 1 | study (Intercept) Residual StdDev: 0.09005302 1 Variance function: Structure: fixed weights Formula: ~ vi Fixed effects: yi ~ 1 Value Std.Error DF t-value p-value (Intercept) 0.1499171 0.03155994 16 4.750234 0.0002 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -1.222824 -0.5462851 0.09989517 0.6159687 1.305632 Number of Observations: 16 Number of Groups: 16
These are the exact same results as obtained earlier with the rma()
function. Unfortunately, the R version of the nlme
package does not provide this functionality.
Update: The R version of the nlme
package does allow the use of the lmeControl(sigma = 1)
control argument (this was added in version 3.1-123, which was released 2016-01-17). However, using this does not yield the same results as obtained above (the results are close but not the same). The results given above (for S-Plus and rma()
) have also been compared to those obtained with Stata (metareg command) and SAS (proc mixed) and are definitely correct. This issue has also been raised on the R-sig-ME mailing list (see here). See also this blog post by James Pustejovsky for some further comparisons between the lme()
, rma()
, and rma.mv()
functions.
Summary
The models fitted by the rma()
function assume that the sampling variances are known. The models fitted by the lm()
, lme()
, and lmer()
functions assume that the sampling variances are known only up to a proportionality constant. These are different models than typically used in meta-analyses.
References
Molloy, G. J., O'Carroll, R. E., & Ferguson, E. (2014). Conscientiousness and medication adherence: A meta-analysis. Annals of Behavioral Medicine, 47(1), 92–101.
Thompson, S. G., & Sharp, S. J. (1999). Explaining heterogeneity in meta-analysis: A comparison of methods. Statistics in Medicine, 18(20), 2693–2708.