Suppose the goal of a meta-analysis is to aggregate the results from studies contrasting two groups (e.g., treatment versus control) and each study measured an outcome of interest using some quantitative scale. A commonly used effect size measure used to quantify the size of the group difference is then the standardized mean difference (also commonly known as Cohen's d).
As an example, consider the data reported in Normand (1999) on the length of the hospital stay of stroke patients under specialized care and under conventional/routine non-specialist care:
library(metafor) dat.normand1999
study source n1i m1i sd1i n2i m2i sd2i 1 1 Edinburgh 155 55 47 156 75 64 2 2 Orpington-Mild 31 27 7 32 29 4 3 3 Orpington-Moderate 75 64 17 71 119 29 4 4 Orpington-Severe 18 66 20 18 137 48 5 5 Montreal-Home 8 14 8 13 18 11 6 6 Montreal-Transfer 57 19 7 52 18 4 7 7 Newcastle 34 52 45 33 41 34 8 8 Umea 110 21 16 183 31 27 9 9 Uppsala 60 30 27 52 23 20
Variables n1i
and n2i
indicate the number of patients under specialized and under routine care, respectively, variables m1i
and m2i
indicate the respective mean length of stay (in days) in the two groups during the acute hospital admission, and sd1i
and sd2i
are the corresponding standard deviations of the length of stay.
With this information, we can compute the standardized mean difference (and corresponding sampling variance) for each study with:
dat1 <- escalc(measure="SMD", m1i=m1i, sd1i=sd1i, n1i=n1i, m2i=m2i, sd2i=sd2i, n2i=n2i, data=dat.normand1999) dat1
study source n1i m1i sd1i n2i m2i sd2i yi vi 1 1 Edinburgh 155 55 47 156 75 64 -0.3552 0.0131 2 2 Orpington-Mild 31 27 7 32 29 4 -0.3479 0.0645 3 3 Orpington-Moderate 75 64 17 71 119 29 -2.3176 0.0458 4 4 Orpington-Severe 18 66 20 18 137 48 -1.8880 0.1606 5 5 Montreal-Home 8 14 8 13 18 11 -0.3840 0.2054 6 6 Montreal-Transfer 57 19 7 52 18 4 0.1721 0.0369 7 7 Newcastle 34 52 45 33 41 34 0.2721 0.0603 8 8 Umea 110 21 16 183 31 27 -0.4246 0.0149 9 9 Uppsala 60 30 27 52 23 20 0.2896 0.0363
Finally, a random-effects model can be fitted to these data with:
res1 <- rma(yi, vi, data=dat1) res1
Random-Effects Model (k = 9; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.7908 (SE = 0.4281) tau (square root of estimated tau^2 value): 0.8893 I^2 (total heterogeneity / total variability): 95.49% H^2 (total variability / sampling variability): 22.20 Test for Heterogeneity: Q(df = 8) = 123.7293, p-val < .0001 Model Results: estimate se zval pval ci.lb ci.ub -0.5371 0.3087 -1.7401 0.0818 -1.1421 0.0679 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Therefore, stroke patients receiving specialized care are estimated to spent on average about half of a standard deviation fewer days in the hospital compared to patients receiving routine care ($\hat{\mu} = -0.54$ with 95% CI: $-1.14$ to $0.07$), but there is a considerable amount of heterogeneity in the findings (as indicated by the large estimate of $\tau^2$, the large $I^2$ value, and the significant $Q$-test).
Note: Since the exact same variable (number of days of the hospital stay) was measured in each study, it would also be possible (and probably better) to conduct the meta-analysis with the raw (unstandardized) mean difference as the effect size measure (or possibly with the (log transformed) ratio of means, since the variable was measured on a ratio scale). The data are used here just for illustration purposes.
Now suppose that the means and standard deviations of the length of stay variable are not reported in all studies, but that the following dataset could be assembled based on information reported in the studies:
dat2 <- dat.normand1999 dat2$dval <- with(dat2, (m1i - m2i) / sqrt(((n1i-1)*sd1i^2 + (n2i-1)*sd2i^2)/(n1i + n2i - 2))) dat2$tval <- with(dat2, dval / sqrt(1/n1i + 1/n2i)) dat2$pval <- 2 * with(dat2, pt(abs(tval), df = n1i + n2i - 2, lower.tail=FALSE)) dat2$sign <- with(dat2, ifelse(m1i > m2i, 1, -1)) dat2[,c("dval","tval","pval")] <- round(dat2[,c("dval","tval","pval")], 2) dat2[c(1,7),c(4,5,7,8,10,11,12)] <- NA dat2[c(5,8),c(4,5,7,8, 9,11,12)] <- NA dat2[c(2,9),c(4,5,7,8, 9,10)] <- NA dat2[c(3,4,6),c(9:12)] <- NA dat2
study source n1i m1i sd1i n2i m2i sd2i dval tval pval sign 1 1 Edinburgh 155 NA NA 156 NA NA -0.36 NA NA NA 2 2 Orpington-Mild 31 NA NA 32 NA NA NA NA 0.17 -1 3 3 Orpington-Moderate 75 64 17 71 119 29 NA NA NA NA 4 4 Orpington-Severe 18 66 20 18 137 48 NA NA NA NA 5 5 Montreal-Home 8 NA NA 13 NA NA NA -0.89 NA NA 6 6 Montreal-Transfer 57 19 7 52 18 4 NA NA NA NA 7 7 Newcastle 34 NA NA 33 NA NA 0.28 NA NA NA 8 8 Umea 110 NA NA 183 NA NA NA -3.53 NA NA 9 9 Uppsala 60 NA NA 52 NA NA NA NA 0.13 1
In particular, in studies 1 and 7, authors reported only the standardized mean difference value (Cohen's d value). In studies 5 and 8, the authors only reported the t-statistic from an independent samples t-test comparing the two groups, and in studies 2 and 9, the authors only reported the (two-sided) p-value corresponding to the t-test. For all studies, the group sizes are known.1)
Given only this information, it is possible to reconstruct the full dataset for the meta-analysis (after each step, I would recommend examining the contents of the dat2
object to better appreciate what the code is doing).
escalc()
function to obtain the (bias-corrected) standardized mean differences and corresponding sampling variances with: dat2 <- escalc(measure="SMD", m1i=m1i, sd1i=sd1i, n1i=n1i, m2i=m2i, sd2i=sd2i, n2i=n2i, data=dat2)
dat2$tval <- replmiss(tval, sign * qt(pval/2, df=n1i+n2i-2, lower.tail=FALSE), data=dat2)
Note that the two-sided p-values need to be turned into one-sided p-values first (i.e., by dividing by 2) before using the qt()
function. Also, the p-values do not contain information on the sign of the t-test values (i.e., whether the first or the second group had the lower mean), so that information needs to be available as well (as encoded in the sign
variable). The replmiss()
function is a little helper function in the metafor package that can be used to replace only the missing values in a variable.
dat2$dval <- replmiss(dval, tval * sqrt(1/n1i + 1/n2i), data=dat2)
This conversion is based on the well-known relationship between Cohen's d values and the test statistic of the independent samples t-test. Again, only missing values are replaced.
yi
variable can now be replaced with the bias-corrected values with: dat2$yi <- replmiss(yi, (1 - 3/(4*(n1i+n2i-2) - 1)) * dval, data=dat2)
vi
variable (the sampling variances) can now be replaced with: dat2$vi <- replmiss(vi, 1/n1i+ 1/n2i + yi^2/(2*(n1i+n2i)), data=dat2)
The code above uses the usual equation for computing (or rather: estimating) the large-sample sampling variance of standardized mean differences.
We can now examine the contents of the dataset:
dat2
study n1i m1i sd1i n2i m2i sd2i dval tval pval sign yi vi 1 1 155 NA NA 156 NA NA -0.3600 NA NA NA -0.3591 0.0131 2 2 31 NA NA 32 NA NA -0.3499 -1.3886 0.17 -1 -0.3456 0.0645 3 3 75 64 17 71 119 29 NA NA NA NA -2.3176 0.0458 4 4 18 66 20 18 137 48 NA NA NA NA -1.8880 0.1606 5 5 8 NA NA 13 NA NA -0.3999 -0.8900 NA NA -0.3839 0.2054 6 6 57 19 7 52 18 4 NA NA NA NA 0.1721 0.0369 7 7 34 NA NA 33 NA NA 0.2800 NA NA NA 0.2768 0.0603 8 8 110 NA NA 183 NA NA -0.4259 -3.5300 NA NA -0.4248 0.0149 9 9 60 NA NA 52 NA NA 0.2890 1.5255 0.13 1 0.2871 0.0363
Any differences in the yi
and vi
variables compared to dat1
are purely a result of the fact that rounded values were reported for variables dval
, tval
, and pval
. However, as we can see, the differences are in this example negligible.
We can then fit a random-effects model to these data with:
res2 <- rma(yi, vi, data=dat2) res2
Random-Effects Model (k = 9; tau^2 estimator: REML) tau^2 (estimated amount of total heterogeneity): 0.7912 (SE = 0.4283) tau (square root of estimated tau^2 value): 0.8895 I^2 (total heterogeneity / total variability): 95.50% H^2 (total variability / sampling variability): 22.20 Test for Heterogeneity: Q(df = 8) = 123.7121, p-val < .0001 Model Results: estimate se zval pval ci.lb ci.ub -0.5371 0.3087 -1.7398 0.0819 -1.1422 0.0680 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The results are essentially the same as obtained earlier.
Some final notes:
Some of the steps above could be further simplified by tricking the escalc()
function into computing the (bias-corrected) standardized mean differences and corresponding sampling variances from reported Cohen's d values directly. Let's put together the dataset one more time:
dat3 <- dat.normand1999 dat3$dval <- with(dat3, (m1i - m2i) / sqrt(((n1i-1)*sd1i^2 + (n2i-1)*sd2i^2)/(n1i + n2i - 2))) dat3$tval <- with(dat3, dval / sqrt(1/n1i + 1/n2i)) dat3$pval <- 2 * with(dat3, pt(abs(tval), df = n1i + n2i - 2, lower.tail=FALSE)) dat3$sign <- with(dat3, ifelse(m1i > m2i, 1, -1)) dat3[,c("dval","tval","pval")] <- round(dat3[,c("dval","tval","pval")], 2) dat3[c(1,7),c(4,5,7,8,10,11,12)] <- NA dat3[c(5,8),c(4,5,7,8, 9,11,12)] <- NA dat3[c(2,9),c(4,5,7,8, 9,10)] <- NA dat3[c(3,4,6),c(9:12)] <- NA
First, as above, we transform the p-values to corresponding t-values with:
dat3$tval <- replmiss(tval, sign * qt(pval/2, df=n1i+n2i-2, lower.tail=FALSE), data=dat3)
And we again transform the t-values to d-values with:
dat3$dval <- replmiss(dval, tval * sqrt(1/n1i + 1/n2i), data=dat3)
Now for the studies for which we have only d-values, the trick is that we set variable m1i
to these d-values, set m2i
to 0, and sd1i
and sd2i
to 1:
dat3$m1i <- replmiss(m1i, dval, data=dat3) dat3$m2i <- replmiss(dat3$m2i, 0) dat3$sd1i <- replmiss(dat3$sd1i, 1) dat3$sd2i <- replmiss(dat3$sd2i, 1)
The dataset now looks like this:
study source n1i m1i sd1i n2i m2i sd2i dval tval pval sign 1 1 Edinburgh 155 -0.3600 1 156 0 1 -0.3600 NA NA NA 2 2 Orpington-Mild 31 -0.3499 1 32 0 1 -0.3499 -1.3886 0.17 -1 3 3 Orpington-Moderate 75 64.0000 17 71 119 29 NA NA NA NA 4 4 Orpington-Severe 18 66.0000 20 18 137 48 NA NA NA NA 5 5 Montreal-Home 8 -0.3999 1 13 0 1 -0.3999 -0.8900 NA NA 6 6 Montreal-Transfer 57 19.0000 7 52 18 4 NA NA NA NA 7 7 Newcastle 34 0.2800 1 33 0 1 0.2800 NA NA NA 8 8 Umea 110 -0.4259 1 183 0 1 -0.4259 -3.5300 NA NA 9 9 Uppsala 60 0.2890 1 52 0 1 0.2890 1.5255 0.13 1
Now we can use escalc()
again:
dat3 <- escalc(measure="SMD", m1i=m1i, sd1i=sd1i, n1i=n1i, m2i=m2i, sd2i=sd2i, n2i=n2i, data=dat3) dat3
study n1i m1i sd1i n2i m2i sd2i dval tval pval sign yi vi 1 1 155 -0.3600 1 156 0 1 -0.3600 NA NA NA -0.3591 0.0131 2 2 31 -0.3499 1 32 0 1 -0.3499 -1.3886 0.17 -1 -0.3456 0.0645 3 3 75 64.0000 17 71 119 29 NA NA NA NA -2.3176 0.0458 4 4 18 66.0000 20 18 137 48 NA NA NA NA -1.8880 0.1606 5 5 8 -0.3999 1 13 0 1 -0.3999 -0.8900 NA NA -0.3839 0.2054 6 6 57 19.0000 7 52 18 4 NA NA NA NA 0.1721 0.0369 7 7 34 0.2800 1 33 0 1 0.2800 NA NA NA 0.2768 0.0603 8 8 110 -0.4259 1 183 0 1 -0.4259 -3.5300 NA NA -0.4248 0.0149 9 9 60 0.2890 1 52 0 1 0.2890 1.5255 0.13 1 0.2871 0.0363
As we can see, the yi
and vi
values are exactly identical to the ones from dat2
above, which are nearly identical to those from dat1
.
Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.
Normand, S. T. (1999). Meta-analysis: Formulating, evaluating, combining, and reporting. Statistics in Medicine, 18(3), 321–359.
ni = n1i + n2i
). If it is not at all possible to obtain information about the group sizes (e.g., by contacting the author(s) of the study), one possibility would be to assume that both groups are of the same size (i.e., n1i = n2i = ni/2
). In the present dataset, this would not be far from the truth in most cases.