Suppose the goal of a meta-analysis is to aggregate the results from studies contrasting two groups (e.g., treatment versus control) and each study measured an outcome of interest using some quantitative scale. A commonly used effect size measure used to quantify the size of the group difference is then the standardized mean difference (also commonly known as Cohen's d).
As an example, consider the data reported in Normand (1999) on the length of the hospital stay of stroke patients under specialized care and under conventional/routine non-specialist care:
library(metafor)
dat.normand1999
study source n1i m1i sd1i n2i m2i sd2i
1 1 Edinburgh 155 55 47 156 75 64
2 2 Orpington-Mild 31 27 7 32 29 4
3 3 Orpington-Moderate 75 64 17 71 119 29
4 4 Orpington-Severe 18 66 20 18 137 48
5 5 Montreal-Home 8 14 8 13 18 11
6 6 Montreal-Transfer 57 19 7 52 18 4
7 7 Newcastle 34 52 45 33 41 34
8 8 Umea 110 21 16 183 31 27
9 9 Uppsala 60 30 27 52 23 20
Variables n1i
and n2i
indicate the number of patients under specialized and under routine care, respectively, variables m1i
and m2i
indicate the respective mean length of stay (in days) in the two groups during the acute hospital admission, and sd1i
and sd2i
are the corresponding standard deviations of the length of stay.
With this information, we can compute the standardized mean difference (and corresponding sampling variance) for each study with:
dat1 <- escalc(measure="SMD", m1i=m1i, sd1i=sd1i, n1i=n1i,
m2i=m2i, sd2i=sd2i, n2i=n2i, data=dat.normand1999)
dat1
study source n1i m1i sd1i n2i m2i sd2i yi vi
1 1 Edinburgh 155 55 47 156 75 64 -0.3552 0.0131
2 2 Orpington-Mild 31 27 7 32 29 4 -0.3479 0.0645
3 3 Orpington-Moderate 75 64 17 71 119 29 -2.3176 0.0458
4 4 Orpington-Severe 18 66 20 18 137 48 -1.8880 0.1606
5 5 Montreal-Home 8 14 8 13 18 11 -0.3840 0.2054
6 6 Montreal-Transfer 57 19 7 52 18 4 0.1721 0.0369
7 7 Newcastle 34 52 45 33 41 34 0.2721 0.0603
8 8 Umea 110 21 16 183 31 27 -0.4246 0.0149
9 9 Uppsala 60 30 27 52 23 20 0.2896 0.0363
Finally, a random-effects model can be fitted to these data with:
res1 <- rma(yi, vi, data=dat1)
res1
Random-Effects Model (k = 9; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0.7908 (SE = 0.4281)
tau (square root of estimated tau^2 value): 0.8893
I^2 (total heterogeneity / total variability): 95.49%
H^2 (total variability / sampling variability): 22.20
Test for Heterogeneity:
Q(df = 8) = 123.7293, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
-0.5371 0.3087 -1.7401 0.0818 -1.1421 0.0679 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Therefore, stroke patients receiving specialized care are estimated to spent on average about half of a standard deviation fewer days in the hospital compared to patients receiving routine care ($\hat{\mu} = -0.54$ with 95% CI: $-1.14$ to $0.07$), but there is a considerable amount of heterogeneity in the findings (as indicated by the large estimate of $\tau^2$, the large $I^2$ value, and the significant $Q$-test).
Sidenote: Since the exact same variable (number of days of the hospital stay) was measured in each study, it would also be possible (and probably better) to conduct the meta-analysis with the raw (unstandardized) mean difference as the effect size measure (or possibly with the (log transformed) ratio of means, since the variable was measured on a ratio scale). The data are used here just for illustration purposes.
Now suppose that the means and standard deviations of the length of stay variable are not reported in all studies, but that the following dataset could be assembled based on the information reported in the studies:
dat2 <- dat.normand1999
dat2$dval <- with(dat2, (m1i - m2i) / sqrt(((n1i-1)*sd1i^2 + (n2i-1)*sd2i^2)/(n1i + n2i - 2)))
dat2$tval <- with(dat2, dval / sqrt(1/n1i + 1/n2i))
dat2$pval <- 2 * with(dat2, pt(abs(tval), df = n1i + n2i - 2, lower.tail=FALSE))
dat2$sign <- with(dat2, ifelse(m1i > m2i, 1, -1))
dat2[,c("dval","tval","pval")] <- round(dat2[,c("dval","tval","pval")], 2)
dat2[c(1,7),c(4,5,7,8,10,11,12)] <- NA
dat2[c(5,8),c(4,5,7,8, 9,11,12)] <- NA
dat2[c(2,9),c(4,5,7,8, 9,10)] <- NA
dat2[c(3,4,6),c(9:12)] <- NA
dat2
study source n1i m1i sd1i n2i m2i sd2i dval tval pval sign
1 1 Edinburgh 155 NA NA 156 NA NA -0.36 NA NA NA
2 2 Orpington-Mild 31 NA NA 32 NA NA NA NA 0.17 -1
3 3 Orpington-Moderate 75 64 17 71 119 29 NA NA NA NA
4 4 Orpington-Severe 18 66 20 18 137 48 NA NA NA NA
5 5 Montreal-Home 8 NA NA 13 NA NA NA -0.89 NA NA
6 6 Montreal-Transfer 57 19 7 52 18 4 NA NA NA NA
7 7 Newcastle 34 NA NA 33 NA NA 0.28 NA NA NA
8 8 Umea 110 NA NA 183 NA NA NA -3.53 NA NA
9 9 Uppsala 60 NA NA 52 NA NA NA NA 0.13 1
In particular, in studies 1 and 7, the authors did not report the means and standard deviations, but they directly reported the standardized mean difference (Cohen's d value). In studies 5 and 8, the authors only reported the t-statistic from an independent samples t-test comparing the two groups, and in studies 2 and 9, the authors only reported the (two-sided) p-value corresponding to the t-test. For all studies, the group sizes are known.1) Given only this information, it is possible to reconstruct the full dataset for the meta-analysis.
For standardized mean differences as the outcome measure, the escalc()
function also allows the user to directly specify d values (if they are known) via the di
argument. Moreover, the t-statistics can be converted into the corresponding Cohen's d values with $$d = t \times \sqrt{1/n_1 + 1/n_2}.$$ This conversion is based on the well-known relationship between Cohen's d values and the test statistic of the independent samples t-test. If we specify argument ti
with the variable containing the t-statistics, then the escalc()
function will do this automatically for us.
Finally, the p-values can be back-transformed to t-statistics, which in turn can be transformed into standardized mean differences as described above. For this, one can specify the p-values via the pi
argument. However, since two-sided p-values do not contain information about the sign of the t-statistics and hence the standardized mean differences, that information needs to be available as well (as encoded in the sign
variable). Therefore, one can specify positive or negative values for pi
and the sign of the values is taken to be the sign of the standardized mean differences.
So, we can use the following code:
dat2 <- escalc(measure="SMD", m1i=m1i, sd1i=sd1i, n1i=n1i,
m2i=m2i, sd2i=sd2i, n2i=n2i,
ti=tval, di=dval, pi=sign*pval, data=dat2)
We can now examine the contents of the dataset with:
dat2
study n1i m1i sd1i n2i m2i sd2i dval tval pval sign yi vi
1 1 155 NA NA 156 NA NA -0.3600 NA NA NA -0.3591 0.0131
2 2 31 NA NA 32 NA NA -0.3499 -1.3886 0.17 -1 -0.3456 0.0645
3 3 75 64 17 71 119 29 NA NA NA NA -2.3176 0.0458
4 4 18 66 20 18 137 48 NA NA NA NA -1.8880 0.1606
5 5 8 NA NA 13 NA NA -0.3999 -0.8900 NA NA -0.3839 0.2054
6 6 57 19 7 52 18 4 NA NA NA NA 0.1721 0.0369
7 7 34 NA NA 33 NA NA 0.2800 NA NA NA 0.2768 0.0603
8 8 110 NA NA 183 NA NA -0.4259 -3.5300 NA NA -0.4248 0.0149
9 9 60 NA NA 52 NA NA 0.2890 1.5255 0.13 1 0.2871 0.0363
Any differences in the yi
and vi
variables compared to dat1
are purely a result of the fact that rounded values were reported for variables dval
, tval
, and pval
. However, as we can see, the differences are (in this example) negligible. Note that the function applied the usual bias correction to the values specified via argument dval
, which is why the corresponding values under yi
are slightly lower.
We can now fit a random-effects model to these data with:
res2 <- rma(yi, vi, data=dat2)
res2
Random-Effects Model (k = 9; tau^2 estimator: REML)
tau^2 (estimated amount of total heterogeneity): 0.7912 (SE = 0.4283)
tau (square root of estimated tau^2 value): 0.8895
I^2 (total heterogeneity / total variability): 95.50%
H^2 (total variability / sampling variability): 22.20
Test for Heterogeneity:
Q(df = 8) = 123.7121, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
-0.5371 0.3087 -1.7398 0.0819 -1.1422 0.0680 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The results are essentially the same as obtained earlier.
Some final notes:
The conversion of a p-value to the corresponding t-value as shown above assumes that the exact p-value is reported. If authors only report that the p-value fell below a certain threshold (e.g., $p < .01$ or if authors only state that the test was significant – which typically implies $p < .05$), then a common approach is to use the value of the cutoff reported (e.g., if $p < .01$ is reported, then assume $p = .01$), which is conservative (since the actual p-value was below that assumed value by some unknown amount). The conversion will therefore tend to be less accurate.
The conversion of a p-value to the t-statistic shown above is only applicable if authors actually conducted an independent samples t-test (i.e., it is not appropriate if the p-value was based on a
Mann–Whitney U test or some other non-parametric test). Also, the t-test must have been a Student's t-test (assuming equal population variances in the two groups) and not
Welch's t-test (allowing for unequal variances).
When specifying Cohen's d values via argument di
, then these values are assumed to be uncorrected values (and escalc()
will then apply the usual bias correction to them). If the values are already corrected, one would technically first have to 'uncorrect' them, so that they are not doubly corrected. However, unless the sample sizes are very small, the bias correction is quite negligible anyway, so even a double correction would have no noteworthy impact on the results.