The metafor Package

Multiple Imputation with the mice and metafor Packages

A meta-analytic dataset often looks like Swiss cheese – there are lots of holes in it! For example, due to missing information, it may not be possible to compute the effect size estimates (or the corresponding sampling variances) for some of the studies. Similarly, it may not be possible to code certain moderator variables for some studies. Below, I illustrate how to use multiple imputation as a possible way to deal with the latter issue, using the mice package in combination with metafor.

Data Preparation

For the example, I will use data from the meta-analysis by Bangert-Drowns et al. (2004) on the effectiveness of school-based writing-to-learn interventions on academic achievement (help(dat.bangertdrowns2004) provides a bit more background). The data can be loaded with:

library(metafor)
dat <- dat.bangertdrowns2004

(I copy the dataset into dat, which is a bit shorter and therefore easier to type further below). We can look at the first 10 and the last 10 rows of the dataset with:

rbind(head(dat, 10), tail(dat, 10))

   id          author year grade length minutes wic feedback info pers imag meta                subject  ni    yi    vi
1   1        Ashworth 1992     4     15      NA   1        1    1    1    0    1                Nursing  60  0.65 0.070
2   2           Ayers 1993     2     10      NA   1       NA    1    1    1    0          Earth Science  34 -0.75 0.126
3   3          Baisch 1990     2      2      NA   1        0    1    1    0    1                   Math  95 -0.21 0.042
4   4           Baker 1994     4      9      10   1        1    1    0    0    0                Algebra 209 -0.04 0.019
5   5          Bauman 1992     1     14      10   1        1    1    1    0    1                   Math 182  0.23 0.022
6   6          Becker 1996     4      1      20   1        0    0    1    0    0             Literature 462  0.03 0.009
7   7     Bell & Bell 1985     3      4      NA   1        1    1    1    0    1                   Math  38  0.26 0.106
8   8         Brodney 1994     1     15      NA   1        1    1    1    0    1                   Math 542  0.06 0.007
9   9          Burton 1986     4      4      NA   0        1    1    0    0    0                   Math  99  0.06 0.040
10 10       Davis, BH 1990     1      9      10   1        0    1    1    0    0         Social Studies  77  0.12 0.052
39 39 Ross & Faucette 1994     4     15      NA   0        1    1    0    0    0                Algebra  16  0.70 0.265
40 40           Sharp 1987     4      2      15   1        0    0    1    0    1                Biology 105  0.49 0.039
41 41         Shepard 1992     2      4      NA   0        1    1    0    0    0                   Math 195  0.20 0.021
42 42         Stewart 1992     3     24       5   1        1    1    1    0    1                Algebra  62  0.58 0.067
43 43         Ullrich 1926     4     11      NA   0        1    1    0    0    0 Educational Psychology 289  0.15 0.014
44 44 Weiss & Walters 1980     4     15       3   1        0    1    0    0    0             Statistics  25  0.63 0.168
45 45           Wells 1986     1      8      15   1        0    1    1    0    1                   Math 250  0.04 0.016
46 46          Willey 1988     3     15      NA  NA        0    1    1    0    1                Biology  51  1.46 0.099
47 47          Willey 1988     2     15      NA  NA        0    1    1    0    1         Social Studies  46  0.04 0.087
48 48       Youngberg 1989     4     15      10   1        1    1    0    0    0                Algebra  56  0.25 0.072

Variable yi contains the effect size estimates (standardized mean differences) and vi the corresponding sampling variances. There are 48 rows of data in this dataset.

For illustration purposes, the following variables will be examined as potential moderators of the treatment effect (i.e., the size of the treatment effect may vary in a systematic way as a function of one or more of these variables):

length: treatment length (in weeks)
wic: writing tasks were completed in class (0 = no; 1 = yes)
feedback: feedback on writing was provided (0 = no; 1 = yes)
info: writing contained informational components (0 = no; 1 = yes)
pers: writing contained personal components (0 = no; 1 = yes)
imag: writing contained imaginative components (0 = no; 1 = yes)
meta: prompts for metacognitive reflection (0 = no; 1 = yes)

More details about the meaning of these variables can be found in Bangert-Drowns et al. (2004). For the purposes of this illustration, it is sufficient to understand that we have 7 variables that are potential (and a priori plausible) predictors of the size of the treatment effect.

To make some of the code below simpler, I will only keep the variables needed for the analyses:

dat <- dat[c("yi", "vi", "length", "wic", "feedback", "info", "pers", "imag", "meta")]

Complete Case Analysis

Standard model fitting functions in R (including those from the metafor package) will usually apply 'listwise deletion' when fitting models with data that contains missing values. In the present case, this implies that any study where the effect size, sampling variance, or any of the moderator variables included in the model is missing will be removed from the dataset before fitting the model. While the effect sizes and sampling variances (i.e., variables yi and vi) are complete, there are some missing values in most of the moderator variables:

data.frame(k.NA=colSums(is.na(dat)))

         k.NA
yi          0
vi          0
length      2
wic         2
feedback    1
info        2
pers        2
imag        0
meta        2

We can also check how many missing values there are for each study:

table(rowSums(is.na(dat)))

 0  1  3
41  5  2

So, for 41 studies, the data are complete, but 5 studies have one missing value, while 2 studies have three missing values. Therefore, when fitting a meta-regression model with all moderators of interest included simultaneously, only the 41 studies with complete data will be included in the analysis.

Let's try this:

res <- rma(yi, vi, mods = ~ length + wic + feedback + info + pers + imag + meta, data=dat)
res

Note that the rma() function alerts the user of the fact that studies with missing values were omitted from the model fitting:

Warning message:
In rma(yi, vi, mods = ~length + wic + feedback + info + pers + imag +  :
  Studies with NAs omitted from model fitting.

The output of res is:

Mixed-Effects Model (k = 41; tau^2 estimator: REML)
 
tau^2 (estimated amount of residual heterogeneity):     0.0223 (SE = 0.0153)
tau (square root of estimated tau^2 value):             0.1494
I^2 (residual heterogeneity / unaccounted variability): 36.91%
H^2 (unaccounted variability / sampling variability):   1.59
R^2 (amount of heterogeneity accounted for):            21.01%
 
Test for Residual Heterogeneity:
QE(df = 33) = 51.4963, p-val = 0.0211
 
Test of Moderators (coefficients 2:8):
QM(df = 7) = 11.7175, p-val = 0.1102
 
Model Results:
 
          estimate      se     zval    pval    ci.lb   ci.ub
intrcpt     0.2689  0.2154   1.2484  0.2119  -0.1533  0.6910
length      0.0072  0.0078   0.9240  0.3555  -0.0081  0.0225
wic        -0.0472  0.1097  -0.4308  0.6666  -0.2622  0.1677
feedback    0.0677  0.1080   0.6265  0.5310  -0.1440  0.2793
info       -0.2233  0.2029  -1.1006  0.2711  -0.6210  0.1744
pers       -0.1137  0.1898  -0.5992  0.5490  -0.4857  0.2582
imag        0.4106  0.1847   2.2233  0.0262   0.0486  0.7726  *
meta        0.2010  0.1742   1.1537  0.2486  -0.1404  0.5424
 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

As can be seen from the output, $k = 41$ studies were included in the analysis (i.e., the ones with complete data). So, roughly 15% (i.e., 7 out of 48) of the studies were excluded from the analysis. While this isn't the most horrible example, it illustrates the loss of data (and information) that can occur when conducting a complete case analysis.

Multiple Imputation

One way of dealing with missing data is to make use of imputation techniques. The advantage of using multiple imputation is that we not only impute once (and then pretend that the imputed values are free of any uncertainty), but multiple times from appropriate distributions, so that several imputed datasets are generated. The same analysis is then applied to each dataset and the results are pooled, taking into consideration not only the uncertainty in each fitted model, but also across models.

The mice package allows us to automate this process and can be used in combination with the metafor package. First, we install and load the mice package:

install.packages("mice")
library(mice)

Most of the moderators are dummy variables (coded 0 vs 1). Although it isn't typically necessary to encode such variables as "proper" factors, this does become relevant in the present case, because mice (by default) will impute factors with two levels based on logistic regression models (as opposed to numeric variables that are by default imputed using predictive mean matching). We can turn the dummy variables into factors with:

dat$wic      <- factor(dat$wic)
dat$feedback <- factor(dat$feedback)
dat$info     <- factor(dat$info)
dat$pers     <- factor(dat$pers)
dat$imag     <- factor(dat$imag)
dat$meta     <- factor(dat$meta)

Next, we will set up the predictor matrix for the imputations. For this, we run the make.predictorMatrix() function.

predMatrix <- make.predictorMatrix(dat)
predMatrix

         yi vi length wic feedback info pers imag meta
yi        0  1      1   1        1    1    1    1    1
vi        1  0      1   1        1    1    1    1    1
length    1  1      0   1        1    1    1    1    1
wic       1  1      1   0        1    1    1    1    1
feedback  1  1      1   1        0    1    1    1    1
info      1  1      1   1        1    0    1    1    1
pers      1  1      1   1        1    1    0    1    1
imag      1  1      1   1        1    1    1    0    1
meta      1  1      1   1        1    1    1    1    0

A value of 1 in this matrix indicates that the corresponding column variable is used to predict the corresponding row variable (so each row can be regarded as specifying the predictors for the model used to predict each row variable). Now we will make a few changes to this matrix. First, I set the entire column corresponding to the vi variable to 0, so that the sampling variances are not used for imputing any of the missing values.¹⁾ Also, I set the rows for variables yi and vi to 0, so that these variables are never imputed (this isn't really relevant here, since these two variables do not actually contain any missing values).

predMatrix[,"vi"] <- 0 # don't use vi for imputing
predMatrix["yi",] <- 0 # don't impute yi (since yi has no NAs, this is actually irrelevant here)
predMatrix["vi",] <- 0 # don't impute vi (since vi has no NAs, this is actually irrelevant here)
predMatrix

         yi vi length wic feedback info pers imag meta
yi        0  0      0   0        0    0    0    0    0
vi        0  0      0   0        0    0    0    0    0
length    1  0      0   1        1    1    1    1    1
wic       1  0      1   0        1    1    1    1    1
feedback  1  0      1   1        0    1    1    1    1
info      1  0      1   1        1    0    1    1    1
pers      1  0      1   1        1    1    0    1    1
imag      1  0      1   1        1    1    1    0    1
meta      1  0      1   1        1    1    1    1    0

Next, I create a vector that specifies the method used to predict (and hence impute) each variable:

impMethod <- make.method(dat)
impMethod

yi       vi   length      wic feedback     info     pers     imag     meta
""       ""    "pmm" "logreg" "logreg" "logreg" "logreg"       "" "logreg"

By default, predictive mean matching (pmm) is used for numeric variables, while logistic regression (logreg) is used for two-level factors. One could change these defaults to other methods (see help(mice) for other imputation methods that could be used), but we will stick to the (usually sensible) defaults here. Note that no imputation method is specified for the yi, vi, and imag variables, since these variables do not contain any missing values.

Now we are ready to generate the multiple imputations. Often (and by default), only 5 datasets are generated, but I increase this to 20 below. I also set the seed (for the random number generator) to make the following results fully reproducible:

imp <- mice(dat, print=FALSE, m=20, predictorMatrix=predMatrix, method=impMethod, seed=1234)

Next, we can fit the model of interest to each of the 20 imputed datasets with:

fit <- with(imp, rma(yi, vi, mods = ~ length + wic + feedback + info + pers + imag + meta))

And finally, we can pool (and round) the results with:

pool <- summary(pool(fit))
pool[-1] <- round(pool[-1], digits=4)
pool

       term estimate std.error statistic      df p.value
1 intercept   0.3819    0.2415    1.5814 36.6335  0.1224
2    length   0.0135    0.0088    1.5353 36.6917  0.1333
3      wic1  -0.0567    0.1298   -0.4369 35.7300  0.6648
4 feedback1   0.0007    0.1222    0.0061 33.6335  0.9952
5     info1  -0.3099    0.2280   -1.3594 37.1289  0.1822
6     pers1  -0.3096    0.1972   -1.5701 35.2466  0.1253
7     imag1   0.2015    0.2111    0.9545 37.4837  0.3459
8     meta1   0.4482    0.1778    2.5204 35.7384  0.0163

For easier comparison, let's look at the coefficient table based on the complete case analysis obtained earlier:

round(coef(summary(res)), digits=4)

            estimate     se    zval   pval   ci.lb  ci.ub
intrcpt       0.2689 0.2154  1.2484 0.2119 -0.1533 0.6910
length        0.0072 0.0078  0.9240 0.3555 -0.0081 0.0225
wic          -0.0472 0.1097 -0.4308 0.6666 -0.2622 0.1677
feedback      0.0677 0.1080  0.6265 0.5310 -0.1440 0.2793
info         -0.2233 0.2029 -1.1006 0.2711 -0.6210 0.1744
pers         -0.1137 0.1898 -0.5992 0.5490 -0.4857 0.2582
imag          0.4106 0.1847  2.2233 0.0262  0.0486 0.7726
meta          0.2010 0.1742  1.1537 0.2486 -0.1404 0.5424

Leaving aside a discussion of the usefulness of p-values, there is an interesting discrepancy in the findings. In particular, while moderator imag was statistically significant in the complete case analysis, the results based on multiple imputation suggest that another moderator (i.e., meta) is significant. I obviously cannot tell you which of these findings is the "correct" one (and possibly neither is), but the difference is definitely noteworthy.

For more details on multiple imputation and the mice package, I would suggest to take a look at the book by Van Buuren (2018), which you can also read online (https://stefvanbuuren.name/fimd/).

References

Van Buuren, S. (2018). Flexible imputation of missing data (2nd ed.). Boca Raton, FL: Chapman & Hall/CRC.

¹⁾

In principle, one could also use the sampling variances as a predictor, but this might be considered a bit unusual, so I refrain from doing so in this example.

Table of Contents

Multiple Imputation with the mice and metafor Packages

Data Preparation

Complete Case Analysis

Multiple Imputation

References