The metafor Package

A Meta-Analysis Package for R

User Tools

Site Tools


tips:assembling_data_or

Assembling Data for a Meta-Analysis of (Log) Odds Ratios

Suppose the goal of a meta-analysis is to aggregate the results from studies contrasting two groups (e.g., treatment versus control) and each study measured a dichotomous outcome of interest (e.g., treatment success versus failure). A commonly used effect size measure used to quantify the size of the group difference (i.e., the size of the treatment effect) is then the odds ratio.

As an example, consider the data reported in Colditz et al. (1994) on the effectiveness of the Bacillus Calmette-Guerin (BCG) vaccine against tuberculosis (for this illustration, we will remove some variables that are not further needed):

library(metafor)
dat.bcg <- dat.bcg[,c(2:7)]
dat.bcg
                 author year tpos  tneg cpos  cneg
1               Aronson 1948    4   119   11   128
2      Ferguson & Simes 1949    6   300   29   274
3       Rosenthal et al 1960    3   228   11   209
4     Hart & Sutherland 1977   62 13536  248 12619
5  Frimodt-Moller et al 1973   33  5036   47  5761
6       Stein & Aronson 1953  180  1361  372  1079
7      Vandiviere et al 1973    8  2537   10   619
8            TPT Madras 1980  505 87886  499 87892
9      Coetzee & Berjak 1968   29  7470   45  7232
10      Rosenthal et al 1961   17  1699   65  1600
11       Comstock et al 1974  186 50448  141 27197
12   Comstock & Webster 1969    5  2493    3  2338
13       Comstock et al 1976   27 16886   29 17825

Variables tpos and tneg indicate the number of TB positive and TB negative cases in the treated (vaccinated) group, while variables cpos and cneg indicate the number of TB positive and TB negative cases in the control (non-vaccinated) group. The data of each study can be arranged in terms of a 2×2 table of the form:

        |  TB+   TB-
--------+------------
Treated |  tpos  tneg
Control |  cpos  cneg

With this information, we can compute the log odds ratio (and corresponding sampling variance) for each study with:

dat1 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
dat1
                 author year tpos  tneg cpos  cneg      yi     vi
1               Aronson 1948    4   119   11   128 -0.9387 0.3571
2      Ferguson & Simes 1949    6   300   29   274 -1.6662 0.2081
3       Rosenthal et al 1960    3   228   11   209 -1.3863 0.4334
4     Hart & Sutherland 1977   62 13536  248 12619 -1.4564 0.0203
5  Frimodt-Moller et al 1973   33  5036   47  5761 -0.2191 0.0520
6       Stein & Aronson 1953  180  1361  372  1079 -0.9581 0.0099
7      Vandiviere et al 1973    8  2537   10   619 -1.6338 0.2270
8            TPT Madras 1980  505 87886  499 87892  0.0120 0.0040
9      Coetzee & Berjak 1968   29  7470   45  7232 -0.4717 0.0570
10      Rosenthal et al 1961   17  1699   65  1600 -1.4012 0.0754
11       Comstock et al 1974  186 50448  141 27197 -0.3408 0.0125
12   Comstock & Webster 1969    5  2493    3  2338  0.4466 0.5342
13       Comstock et al 1976   27 16886   29 17825 -0.0173 0.0716

Note that the escalc() function directly computes the log-transformed odds ratios, as these are the values we need for a meta-analysis. A negative log odds ratio indicates that the odds of a TB infection were lower in the treated group compared to the control group in a particular study.

A random-effects model can then be fitted to these data with:

res1 <- rma(yi, vi, data=dat1)
res1
Random-Effects Model (k = 13; tau^2 estimator: REML)
 
tau^2 (estimated amount of total heterogeneity): 0.3378 (SE = 0.1784)
tau (square root of estimated tau^2 value):      0.5812
I^2 (total heterogeneity / total variability):   92.07%
H^2 (total variability / sampling variability):  12.61
 
Test for Heterogeneity:
Q(df = 12) = 163.1649, p-val < .0001
 
Model Results:
 
estimate      se     zval    pval    ci.lb    ci.ub
 -0.7452  0.1860  -4.0057  <.0001  -1.1098  -0.3806  ***
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Therefore, the estimated average log odds ratio is equal to $\hat{\mu} = -0.75$ (with 95% CI: $-1.11$ to $-0.38$). For easier interpretation, we can back-transform the results with:

predict(res1, transf=exp, digits=2)
 pred ci.lb ci.ub pi.lb pi.ub
 0.47  0.33  0.68  0.14  1.57

The odds of a TB infection are therefore estimated to be approximately half as large on average in vaccinated groups (i.e., an odds ratio of $0.47$ with 95% CI: $0.33$ to $0.68$), or put differently, we can say that the odds of infection are on average 53% lower (i.e., $1 - 0.47 = 0.53$) in vaccinated groups. However, there is a considerable amount of heterogeneity in the findings (as indicated by the large estimate of $\tau^2$, the wide prediction interval, the large $I^2$ value, and the significant $Q$-test).

Now suppose that the 2×2 table data are not reported in all studies, but that the following dataset could be assembled based on information reported in the studies:

dat2 <- data.frame(summary(dat1))
dat2[c("yi", "ci.lb", "ci.ub")] <- data.frame(summary(dat1, transf=exp))[c("yi", "ci.lb", "ci.ub")]
names(dat2)[which(names(dat2) == "yi")] <- "or"
dat2[,c("or","ci.lb","ci.ub","pval")] <- round(dat2[,c("or","ci.lb","ci.ub","pval")], digits=2)
dat2$vi <- dat2$sei <- dat2$zi <- NULL
dat2$ntot <- with(dat2, tpos + tneg + cpos + cneg)
dat2[c(1,12),c(3:6,9:10)] <- NA
dat2[c(4,9), c(3:6,8)] <- NA
dat2[c(2:3,5:8,10:11,13),c(7:10)] <- NA
dat2
                 author year tpos  tneg cpos  cneg   or pval ci.lb ci.ub   ntot
1               Aronson 1948   NA    NA   NA    NA 0.39 0.12    NA    NA    262
2      Ferguson & Simes 1949    6   300   29   274   NA   NA    NA    NA    609
3       Rosenthal et al 1960    3   228   11   209   NA   NA    NA    NA    451
4     Hart & Sutherland 1977   NA    NA   NA    NA 0.23   NA  0.18  0.31  26465
5  Frimodt-Moller et al 1973   33  5036   47  5761   NA   NA    NA    NA  10877
6       Stein & Aronson 1953  180  1361  372  1079   NA   NA    NA    NA   2992
7      Vandiviere et al 1973    8  2537   10   619   NA   NA    NA    NA   3174
8            TPT Madras 1980  505 87886  499 87892   NA   NA    NA    NA 176782
9      Coetzee & Berjak 1968   NA    NA   NA    NA 0.62   NA  0.39  1.00  14776
10      Rosenthal et al 1961   17  1699   65  1600   NA   NA    NA    NA   3381
11       Comstock et al 1974  186 50448  141 27197   NA   NA    NA    NA  77972
12   Comstock & Webster 1969   NA    NA   NA    NA 1.56 0.54    NA    NA   4839
13       Comstock et al 1976   27 16886   29 17825   NA   NA    NA    NA  34767

In particular, in studies 1 and 12, authors reported only the odds ratio and the corresponding p-value (based on a Wald-type test whether the log odds ratio differs significantly from 0) and in studies 4 and 9, authors reported only the odds ratio and the corresponding 95% Wald-type confidence interval bounds. Given only this information, it is possible to reconstruct the full dataset for the meta-analysis.

First, we use the escalc() function as before.

dat2 <- escalc(measure="OR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat2)
dat2
    .  tpos   tneg  cpos   cneg    or  pval  ci.lb  ci.ub    ntot       yi      vi
1   .    NA     NA    NA     NA  0.39  0.12     NA     NA     262       NA      NA
2   .     6    300    29    274    NA    NA     NA     NA     609  -1.6662  0.2081
3   .     3    228    11    209    NA    NA     NA     NA     451  -1.3863  0.4334
4   .    NA     NA    NA     NA  0.23    NA   0.18   0.31   26465       NA      NA
5   .    33   5036    47   5761    NA    NA     NA     NA   10877  -0.2191  0.0520
6   .   180   1361   372   1079    NA    NA     NA     NA    2992  -0.9581  0.0099
7   .     8   2537    10    619    NA    NA     NA     NA    3174  -1.6338  0.2270
8   .   505  87886   499  87892    NA    NA     NA     NA  176782   0.0120  0.0040
9   .    NA     NA    NA     NA  0.62    NA   0.39   1.00   14776       NA      NA
10  .    17   1699    65   1600    NA    NA     NA     NA    3381  -1.4012  0.0754
11  .   186  50448   141  27197    NA    NA     NA     NA   77972  -0.3408  0.0125
12  .    NA     NA    NA     NA  1.56  0.54     NA     NA    4839       NA      NA
13  .    27  16886    29  17825    NA    NA     NA     NA   34767  -0.0173  0.0716

As we can see above, this will calculate the log odds ratios and corresponding sampling variances based on the 2×2 table data where possible. For studies not reporting 2×2 data (studies 1, 4, 9, and 12), the values for the yi and vi variables are missing.

For the studies that directly report the odds ratios, it is trivial to convert these values to the log odds ratios. What is a bit more tricky is the computation of the corresponding sampling variances. However, the p-values from the Wald-type tests and the Wald-type confidence intervals provide sufficient information to reconstruct the sampling variances of the log odds ratios. For this, we can use the conv.wald() function as follows.

dat2 <- conv.wald(out=or, ci.lb=ci.lb, ci.ub=ci.ub, pval=pval, n=ntot, data=dat2, transf=log)
dat2
    .  tpos   tneg  cpos   cneg    or  pval  ci.lb  ci.ub    ntot       yi      vi
1   .    NA     NA    NA     NA  0.39  0.12     NA     NA     262  -0.9416  0.3668
2   .     6    300    29    274    NA    NA     NA     NA     609  -1.6662  0.2081
3   .     3    228    11    209    NA    NA     NA     NA     451  -1.3863  0.4334
4   .    NA     NA    NA     NA  0.23    NA   0.18   0.31   26465  -1.4697  0.0192
5   .    33   5036    47   5761    NA    NA     NA     NA   10877  -0.2191  0.0520
6   .   180   1361   372   1079    NA    NA     NA     NA    2992  -0.9581  0.0099
7   .     8   2537    10    619    NA    NA     NA     NA    3174  -1.6338  0.2270
8   .   505  87886   499  87892    NA    NA     NA     NA  176782   0.0120  0.0040
9   .    NA     NA    NA     NA  0.62    NA   0.39   1.00   14776  -0.4780  0.0577
10  .    17   1699    65   1600    NA    NA     NA     NA    3381  -1.4012  0.0754
11  .   186  50448   141  27197    NA    NA     NA     NA   77972  -0.3408  0.0125
12  .    NA     NA    NA     NA  1.56  0.54     NA     NA    4839   0.4447  0.5266
13  .    27  16886    29  17825    NA    NA     NA     NA   34767  -0.0173  0.0716

We now have a complete dataset. Any differences compared to dat1 are purely a result of the rounding of the or, ci.lb, ci.ub, and pval variables. However, the differences are negligible.

Sidenote: The n argument was used above to supply the total sample sizes of the studies to the function. This has no relevance for the calculations done by conv.wald(), but some other functions may use this information (e.g., when drawing a funnel plot with the funnel() function and one adjusts the yaxis argument to one of the options that puts the sample sizes or some transformation thereof on the y-axis).

We can then fit a random-effects model to these data with:

res2 <- rma(yi, vi, data=dat2)
res2
Random-Effects Model (k = 13; tau^2 estimator: REML)
 
tau^2 (estimated amount of total heterogeneity): 0.3408 (SE = 0.1798)
tau (square root of estimated tau^2 value):      0.5838
I^2 (total heterogeneity / total variability):   92.18%
H^2 (total variability / sampling variability):  12.80
 
Test for Heterogeneity:
Q(df = 12) = 167.4513, p-val < .0001
 
Model Results:
 
estimate      se     zval    pval    ci.lb    ci.ub
 -0.7472  0.1867  -4.0015  <.0001  -1.1132  -0.3812  ***
 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
predict(res2, transf=exp, digits=2)
 pred ci.lb ci.ub pi.lb pi.ub
 0.47  0.33  0.68  0.14  1.57

These results are essentially the same as the ones we obtained earlier.

References

Colditz, G. A., Brewer, T. F., Berkey, C. S., Wilson, M. E., Burdick, E., Fineberg, H. V., & Mosteller, F. (1994). Efficacy of BCG vaccine in the prevention of tuberculosis: Meta-analysis of the published literature. Journal of the American Medical Association, 271(9), 698–702.

tips/assembling_data_or.txt · Last modified: 2022/11/27 19:01 by Wolfgang Viechtbauer