Convergence Problems with the rma() Function

Some routines within the rma() function are not based on closed-form solutions, but require numerical (iterative) methods. In particular, when using the ML (maximum likelihood), REML (restricted maximum likelihood), and EB (empirical Bayes) estimators for $\tau^2$, the function makes use of the Fisher scoring algorithm, which is robust to poor starting values and usually converges quickly (Harville, 1977; Jennrich & Sampson, 1976). By default, the starting value is set equal to the value of the Hedges (HE) estimator and the algorithm terminates when the change in the estimated value of $\tau^2$ is smaller than $10^{-5}$ from one iteration to the next. The maximum number of iterations is 100 by default (which should be sufficient in most cases). Information on the progress of the algorithm can be obtained by setting verbose=TRUE or with control=list(verbose=TRUE).

Example 1: Convergence with Default Settings

Let's examine how the algorithm works in one particular example. First, we load the package and then create a vector with the effect size estimates and corresponding sampling variances:

Sidenote: For all of the examples given on this page, I will increase the number of decimals shown to 5 digits, but this only affects the displayed output, not the accuracy of the underlying algorithms.

So, the starting value for the algorithm is the one obtained with the HE estimator. After 12 iterations, the algorithm has converged to an estimate of 0.19878, as the change in the estimate from one iteration to the next is sufficiently small at this point (i.e., less than .00001). We can also visualize the progress of the algorithm by adding (red) points corresponding to the estimates at the various iterations to a profile plot of the restricted log-likelihood as a function of $\tau^2$.

Example 2: Convergence with Increased Number of Iterations

A different starting value, threshold, and maximum number of iterations can be specified via the control argument by setting control=list(tau2.init=value, threshold=value, maxiter=value). The step length of the Fisher scoring algorithm can also be manually adjusted by a desired factor with control=list(stepadj=value) (values below 1 will reduce the step length).

These things can become useful when the Fisher scoring algorithm does not converge with the default settings. Let's consider an example.

It took a rather large number of iterations, but eventually convergence was achieved. An animation illustrating the progress of the algorithm is shown below.

Interestingly, the algorithm keeps overshooting the peak, cycling back and forth, but due to the decreasing amplitude of the oscillation, the peak (i.e., the REML estimate) is eventually found. This behavior suggests that the default step length is a bit too large in this case. When we adjust the step length by a constant factor below 1 (e.g., 0.5), convergence is rapidly achieved within a few iterations.

Not even 10000 iterations are not enough to get the algorithm to converge. The animation below shows what the problem is.

Note how the algorithm keeps cycling through a series of values, repeatedly overshooting the target, and just never converges on the peak (I only went up to 500 iterations above, but the same thing continues to happen indefinitely). This again indicates that the step length is just too large here. Reducing the step length easily solves this problem.

Using rma.mv() for Model Fitting

As explained elsewhere on this site, the rma.mv() function can also be used to fit the same models as the rma() function. The underlying algorithms for the model fitting are a bit different though and make use of other optimization functions available in R (choices include optim(), nlminb(), and a bunch of others). So, in case rma() does not converge, another solution may be to switch to the rma.mv() function to see if this one works better. Let's try this out with the data from the previous example. Note that we need to explicitly add the random effects for each study when we use the rma.mv() function. So, the following code will fit the same model as rma() did earlier.

The REML estimate of the variance component is the same as the one obtained earlier with the Fisher scoring algorithm after applying the step length adjustment. Note that minor numerical differences are possible when using different optimization routines. We can also see this when switching to a different optimizer.

Maximum Likelihood Estimation

The examples above all involved the use of restricted maximum-likelihood (REML) estimation, which is the default for models fitted with the rma() and rma.mv() functions. Regular maximum likelihood (ML) estimation can be used with method="ML". Let's try this out with the data from the previous example.

No convergence problems with the default settings (of course, the estimate of $\tau^2$ is different). So, how well the algorithm works in any particular case can also depend on which estimator of $\tau^2$ is chosen.

Empirical Bayes / Paule-Mandel Estimator

The empirical Bayes (EB) estimator (Berkey et al., 1995; Morris, 1983) can also only be obtained iteratively. For the same data as used above, the Fisher scoring algorithm again has difficulties converging with the default settings, but this is easily remedied by adjusting the step length.

The empirical Bayes estimator can actually be shown to be identical to the estimator described by Paule and Mandel (1982). However, while the EB estimator is obtained via the Fisher scoring algorithm, the Paule-Mandel (PM) estimator is obtained in a different way, making use of the uniroot() function. By default, the desired accuracy and maximum number of iterations are set as described above. The upper bound of the interval searched is set to 100 (which should be large enough for most cases). The desired accuracy (threshold), maximum number of iterations (maxiter), and upper bound (tau2.max) can be adjusted with control=list(threshold=value, maxiter=value, tau2.max=value). But let's try obtaining the estimate with the default settings.

Afterword

Computing the values of the various estimators described above requires the use of optimization techniques or root-finding algorithms. While the specific application of such numerical methods considered here is relatively simple, it is still difficult to find a method that always works regardless of the data you throw at it.

When I first started to develop the routines that now form the basis of the rma() function (those routines go back further than the metafor package and its predecessor, the mima() function – see here), the ML, REML, and EB estimators were computed at the time via relatively simple iterative schemes (e.g., National Research Council, 1992; Thompson & Sharp, 1999) that seemed to lack a solid theoretical foundation and also could exhibit cyclical non-convergence behavior (Brockwell & Gordon, 2001). So, I programmed various optimization techniques from scratch, including the Newton-Raphson technique and the Fisher scoring algorithm and found that the latter worked quite well in most cases. As part of my dissertation research, I also showed that the Fisher scoring algorithm in this specific context can be rewritten such that it corresponds to the iterative schemes described in some of those early papers.

In the end, this implies that the algorithm, despite its long history and positive representation in the literature as a generally robust and effective numerical procedure for computing ML/REML estimates of variance components (e.g., Harville, 1977; Jennrich & Sampson, 1976), also inherited the potential for the less than desirable behavior (for certain meta-analytic models) as illustrated in some of the examples above. However, in pretty much all of the cases where this issue arises, the most effective solution appears to be a reduction of the step length. Of course, I could consider making this adjustment by default, but that could lead to backwards incompatibility. And since the solution is easy to apply (and there is an easy alternative by means of the rma.mv() function, at least for ML and REML estimation), I see little need to make any changes to the underlying code.

Two things are important to note though:

First of all, non-convergence of the algorithm is still a relatively rare occurrence. In practice, this is rarely going to affect the user. On the other hand, simulation studies that make use of the metafor package – where the same model is applied thousands of times to simulated data – may require setting up 'contingency plans' when non-convergence is encountered (the try() function is your friend here).

Second, non-convergence is not a deficiency of the estimators themselves. The ML, REML, and EB/PM estimators all have a solid theoretical basis – it's just that they cannot be computed with a simple equation, so numerical methods must be used. And any specific numerical method can sometimes fail. But such problems can usually be overcome with simple adjustments to the algorithm or switching to another method. So, please keep this in mind when you read criticisms of iterative estimators (such as ML, REML, or EB/PM) related to non-convergence: The problem does not lie with the estimators themselves but with the numerical methods used to compute them.