Convergence Problems with the rma.mv() Function
Model fitting with the rma.mv()
function makes use of iterative methods that attempt to maximize the log likelihood function over the model parameters. The optimization techniques that are used for this purpose sometimes do not converge, in which case the function will inform the user that the optimizer did not achieve convergence. Below, I illustrate this problem with an example and discuss some remedies.
Example
Consider the following dataset:
library(metafor) dat <- structure(list(study = c(1L, 2L, 2L, 2L, 2L, 2L, 3L, 4L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 11L, 12L, 12L, 12L, 12L, 13L, 14L, 14L, 15L, 15L, 16L, 17L, 17L, 17L, 17L, 18L, 19L, 19L, 20L), id = 1:46, yij = c(0.686, 0.672, 0.742, 0.914, 0.861, 0.676, 0.711, 0.688, 0.355, 0.774, 1.067, 0.869, 1.121, 0.328, 0.559, 0.62, 0.442, 0.299, 0.387, 0.376, 0.44, 0.414, 0.414, 0.39, 0.061, 0.334, -0.015, 0.508, 0.958, 0.256, 0.539, 0.941, 0.57, 0.308, 0.274, 0.834, 0.633, 0.195, 0.504, 0.216, 0.545, 0.069, 0.097, 0.662, 0.633, 0.317 ), vij = c(0.085, 0.007, 0.085, 0.026, 0.076, 0.028, 0.074, 0.024, 0.085, 0.027, 0.053, 0.054, 0.055, 0.045, 0.014, 0.078, 0.095, 0.028, 0.029, 0.084, 0.017, 0.064, 0.084, 0.05, 0.067, 0.059, 0.092, 0.027, 0.082, 0.096, 0.086, 0.049, 0.056, 0.042, 0.016, 0.068, 0.006, 0.085, 0.041, 0.048, 0.073, 0.048, 0.066, 0.009, 0.016, 0.085)), class = "data.frame", row.names = c(NA, -46L)) dat
study id yij vij 1 1 1 0.686 0.085 2 2 2 0.672 0.007 3 2 3 0.742 0.085 4 2 4 0.914 0.026 5 2 5 0.861 0.076 6 2 6 0.676 0.028 7 3 7 0.711 0.074 8 4 8 0.688 0.024 9 5 9 0.355 0.085 10 6 10 0.774 0.027 11 6 11 1.067 0.053 12 6 12 0.869 0.054 13 . . . .
Variable yij
contains the effect sizes and variable vij
the corresponding sampling variances. The dataset has a multilevel structure, with some studies providing multiple effect size estimates (e.g., study 2 provides 5 estimates). Leaving aside further details, a sensible model for these data may be a multilevel random-effects model, which we can fit with:
res <- rma.mv(yij, vij, random = ~ 1 | study / id, data=dat)
Unfortunately, this yields the error:
Error : Optimizer (nlminb) did not achieve convergence (convergence = 1).
Hence, the algorithm used for the model fitting (which by default is the nlminb()
function, which uses a Quasi-Newton method for the optimization) did not converge with the default settings.
By setting verbose=TRUE
, we can obtain information on the progress of the optimization algorithm:
res <- rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, verbose=TRUE)
Iteration 0 ll = -0.6174 sigma2 = 0.0107 0.0107 Iteration 1 ll = -0.6174 sigma2 = 0.0107 0.0107 ... Iteration 7 ll = 0.7454 sigma2 = 0.0224 0.0054 ... Iteration 14 ll = 1.3502 sigma2 = 0.0283 0.0015 ... Iteration 21 ll = 1.4727 sigma2 = 0.0222 0.0008 ... Iteration 56 ll = 1.6225 sigma2 = 0.0252 0.0000 ... Iteration 124 ll = 1.6284 sigma2 = 0.0252 0.0000 Iteration 125 ll = 1.6285 sigma2 = 0.0252 0.0000 ... Iteration 126 ll = 1.6285 sigma2 = 0.0252 0.0000 Iteration 306 ll = 1.6285 sigma2 = 0.0252 0.0000
Starting around iteration 126, there is no further change in the log likelihood when rounded to 4 decimal places (and neither in the two variance components that are estimated), but nlminb()
uses a much smaller threshold for determining when convergence has occurred.
Via the control
argument, one can adjust the settings of the optimizer that is used. For example, reading help(nlminb)
reveals that one can adjust the number of iterations via the iter.max
control argument (which by default is 150). So let's try increasing this to something larger:
res <- rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, verbose=TRUE, control=list(iter.max=1000))
The output (not shown) is the same as before, so this didn't really help with achieving convergence. The problem in this case is the rather small threshold that is used by default for determining whether the change in the log likelihood is small enough to declare that convergence has occurred. As we see above, convergence actually seems to occur, but is not detected by the algorithm. Let's try a slightly less strict threshold by adjusting rel.tol
(which by default is 1e-10
):
res <- rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, verbose=TRUE, control=list(rel.tol=1e-8)) res
Now the error disappears and the output shows that convergence was reached after 226 iterations. We can then proceed to examine the results:
Multivariate Meta-Analysis Model (k = 46; method: REML) Variance Components: estim sqrt nlvls fixed factor sigma^2.1 0.0252 0.1588 20 no study sigma^2.2 0.0000 0.0000 46 no study/id Test for Heterogeneity: Q(df = 45) = 68.9392, p-val = 0.0124 Model Results: estimate se zval pval ci.lb ci.ub 0.5245 0.0488 10.7373 <.0001 0.4288 0.6203 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that the second variance component ($\sigma^2_1$), which provides an estimate of the amount of within-study heterogeneity, converged to (essentially) 0. There is nothing inherently wrong with this (it could very well be the case that all of the heterogeneity in these data is due to differences between studies and none due to differences in the effect sizes within studies), but whenever parameter estimates are drifting towards bounds like this, it does make life more difficult for the optimization routines.
The figure below shows the (restricted) log likelihood surface for these data/model as a function of the two variance components. The black line indicates the path that the optimizer takes over the surface, eventually reaching the maximum at the red dot (for an animated version of this figure, click here).
Instead of adjusting the settings of the default optimizer, we can also try switching to a different optimization routine. The rma.mv()
function actually provides access to more than a dozen different optimizers, which can be chosen via the optimizer
control argument. Below, I fit the same model as above, trying out 12 other optimizers (besides the default one). Where necessary, I adjusted some other settings to achieve convergence with all of the optimizers.
res <- list( rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, control=list(optimizer="nlminb", rel.tol=1e-8)), rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, control=list(optimizer="Nelder-Mead")), rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, control=list(optimizer="BFGS")), rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, control=list(optimizer="bobyqa")), rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, control=list(optimizer="nloptr", maxeval=1000)), rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, control=list(optimizer="nlm")), rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, control=list(optimizer="hjk")), rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, control=list(optimizer="nmk")), rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, control=list(optimizer="mads")), rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, control=list(optimizer="ucminf")), rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, control=list(optimizer="lbfgsb3c")), rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, control=list(optimizer="subplex")), rma.mv(yij, vij, random = ~ 1 | study / id, data=dat, control=list(optimizer="BBoptim")))
We can then extract the log likelihood and estimates of the two variance components and put everything into a data frame as follows:
tab <- lapply(res, function(x) data.frame(optimizer=x$control$optimizer, ll=logLik(x), sigma2.1=x$sigma2[1], sigma2.2=x$sigma2[2])) tab <- do.call(rbind, tab) dfround(tab, digits=5)
optimizer ll sigma2.1 sigma2.2 1 nlminb 1.62847 0.02522 0 2 Nelder-Mead 1.62847 0.02522 0 3 BFGS 1.62830 0.02522 0 4 bobyqa 1.62847 0.02522 0 5 nloptr 1.62841 0.02522 0 6 nlm 1.62847 0.02522 0 7 hjk 1.62847 0.02522 0 8 nmk 1.62847 0.02523 0 9 mads 1.62847 0.02522 0 10 ucminf 1.62847 0.02522 0 11 lbfgsb3c 1.62847 0.02522 0 12 subplex 1.62847 0.02522 0 13 BBoptim 1.62847 0.02522 0
We can see some minor discrepancies between some of the log likelihood values, but the results are essentially identical for all optimizers.
Given that some other optimizers converged without any adjustments to their default settings, one might wonder why nlminb()
is the default optimizer in rma.mv()
(which did require some tweaking to achieve convergence in this example). The example above was constructed specifically to lead to non-convergence with nlminb()
under its default settings. For many other datasets, nlminb()
will converge without difficulties, while some of the other optimizers might struggle. There is simply not a single algorithm that works flawlessly in all cases. In my experience, nlminb()
is actually a quite reasonable and robust default choice and typically works without problems.