Based on some code I wrote as part of my dissertation research, I developed a function called mima() which provided the basic functionality for fitting fixed- and random/mixed-effects (meta-regression) models. Around 2006, I placed the function on my website (together with a short tutorial) and it was picked up by several researchers who used the function in several meta-analyses. However, while the mima() function provided the basic functionality for fitting standard meta-analytic models and conducting meta-regression analyses, the metafor package was written in response to several requests to expand the function into a full package for conducting meta-analyses with additional options and support functions. The mima() function is therefore now obsolete and has been removed from my website.

Various attempts have been made to validate the functions in the metafor package. First of all, when corresponding analyses could be carried out, I have compared the results provided by the metafor package with those provided by other software packages for several data sets. In particular, results have been compared with those provided by the metan, metareg, metabias, and metatrim commands in Stata (for more details on these commands, see Sterne, 2009). Results were also compared with those provided by SAS using the proc mixed command (for more details, see van Houwelingen, Arends, & Stijnen, 2002), by SPSS using the macros developed by David Wilson (Lipsey & Wilson, 2001), by the meta(CRAN Link) and rmeta(CRAN Link) packages in R, and by Comprehensive Meta-Analysis, MetaWin, and the Review Manager of the Cochrane Collaboration. Results either agreed completely or fell within a margin of error expected when using numerical methods.

Second, results provided by the metafor package have been compared with published results described in articles and books (the assumption being that those results are in fact correct). On this website, I provide a number of such analysis examples that you can examine yourself. All of these examples (and some more) are also encapsulated in automated tests using the testthat package, so that any changes to the code that would lead to these examples becoming non-reproducible are automatically detected.

Third, I have conducted extensive simulation studies for many of the methods implemented in the package to ensure that their statistical properties are as one would expect based on the underlying theory. To give a simple example, under the assumptions of an equal-effects model (i.e., homogeneous true effects, normally distributed effect size estimates, known sampling variances), the empirical rejection rate of $H_0: \theta = 0$ must be nominal (within the margin of error one would expect when randomly simulating such data). This is in fact the case, providing support that the rma() function is working appropriately for this scenario. Similar tests have been conducted for the more intricate methods in the package.

It may also be useful to note that there is now an appreciable user base of the metafor package (the Viechtbauer (2010) article describing the package has been cited in over 3000 articles, many of which are applied meta-analyses and/or methodological/statistical papers that have used the metafor package as part of the research). This increases the chances that any bugs would be detected, reported, and corrected.

For the most part, the development of the package has been funded through my own precious time. Through some collaborative work on the 'Open Meta-Analyst' software from the Center for Evidence-Based Medicine at Brown University, I have received some funding as part of a subcontract on a grant. Also, Sandra Wilson and Mark Lipsey of the Peabody Research Institute at Vanderbilt University have provided funding for making the rma.mv() more efficient and for adding multicore capabilities to the profile.rma.mv() function. However, further developments of the package could proceed much quicker if additional funding was available. If you are aware of any funding possibilities, feel free to let me know!

There are actually many R packages available for conducting meta-analyses. Fortunately, there is now a Task View for Meta-Analysis, which provides a pretty thorough overview of the different packages and their capabilities.

Technical Questions

Standard meta-analytic models (as can be fitted with the rma() function) assume that the sampling variances are known. On the other hand, the models fitted by the lm() and lme() functions assume that the sampling variances are known only up to a proportionality constant. So these are different models than typically used in meta-analyses. For more details, I have written up more comprehensive comparison of the rma() and the lm() and lme() functions.

For random-effects models, the $I^2$ statistic is computed with $$I^2 = 100\% \times \frac{\hat{\tau}^2}{\hat{\tau}^2 + s^2},$$ where $\hat{\tau}^2$ is the estimated value of $\tau^2$ and $$s^2 = \frac{(k-1) \sum w_i}{(\sum w_i)^2 - \sum w_i^2},$$ where $w_i$ is the inverse of the sampling variance of the $i$th study ($s^2$ is equation 9 in Higgins & Thompson, 2002, and can be regarded as the 'typical' within-study variance of the observed effect sizes or outcomes). The $H^2$ statistic is computed with $$H^2 = \frac{\hat{\tau}^2 + s^2}{s^2}.$$ Analogous equations are used for mixed-effects models.

Therefore, depending on the estimator of $\tau^2$ used, the values of $I^2$ and $H^2$ will change. For random-effects models, $I^2$ and $H^2$ are often computed in practice with $I^2 = 100\% \times (Q-(k-1))/Q$ and $H^2 = Q/(k-1)$, where $Q$ denotes the statistic for the test of heterogeneity and $k$ the number of studies (i.e., observed effects or outcomes) included in the meta-analysis. The equations used in the metafor package to compute these statistics are based on more general definitions and have the advantage that the values of $I^2$ and $H^2$ will be consistent with the estimated value of $\tau^2$ (i.e., if $\hat{\tau}^2 = 0$, then $I^2 = 0$ and $H^2 = 1$ and if $\hat{\tau}^2 > 0$, then $I^2 > 0$ and $H^2 > 1$).

These two sets of equations for $I^2$ and $H^2$ actually coincide when using the DerSimonian-Laird estimator of $\tau^2$ (i.e., the commonly used equations are actually special cases of the more general definitions given above). Therefore, if you prefer the more conventional definitions of these statistics, use method="DL" when fitting the random/mixed-effects model with the rma() function.

The pseudo $R^2$ statistic (Raudenbush, 2009) is computed with $$R^2 = \frac{\hat{\tau}_{RE}^2 - \hat{\tau}_{ME}^2}{\hat{\tau}_{RE}^2},$$ where $\hat{\tau}_{RE}^2$ denotes the estimated value of $\tau^2$ based on the random-effects model (i.e., the total amount of heterogeneity) and $\hat{\tau}_{ME}^2$ denotes the estimated value of $\tau^2$ based on the mixed-effects model (i.e., the residual amount of heterogeneity). It can happen that $\hat{\tau}_{RE}^2 < \hat{\tau}_{ME}^2$, in which case $R^2$ is set to zero. Again, the value of $R^2$ will change depending on the estimator of $\tau^2$ used. Also note that this statistic is only computed when the mixed-effects model includes an intercept (so that the random-effects model is clearly nested within the mixed-effects model). You can also use the anova.rma.uni() function to compute $R^2$ for any two models that are known to be nested.

By default, the interval is computed with $$\hat{\mu} \pm z_{1-\alpha/2} \sqrt{\mbox{SE}[\hat{\mu}]^2 + \hat{\tau}^2},$$ where $\hat{\mu}$ is the estimated average true outcome, $z_{1-\alpha/2}$ is the $100 \times (1-\alpha/2)$th percentile of a standard normal distribution (e.g., $1.96$ for $\alpha = .05$), $\mbox{SE}[\hat{\mu}]$ is the standard error of $\hat{\mu}$, and $\hat{\tau}^2$ is the estimated amount of heterogeneity (i.e., the variance in the true outcomes across studies). If the model was fitted with the Knapp and Hartung (2003) method (i.e., with test="knha" in rma()), then instead of $z_{1-\alpha/2}$, the $100 \times (1-\alpha/2)$th percentile of a t-distribution with $k-1$ degrees of freedom is used.

Note that this differs from Riley et al. (2001), who suggest to use a t-distribution with $k-2$ degrees of freedom for constructing the interval. Neither a normal, nor a t-distribution with $k-1$ or $k-2$ degrees of freedom is correct; all of these are approximations. The computations in metafor are done in the way described above, so that the credibility/prediction interval is identical to the confidence interval for $\mu$ when $\hat{\tau}^2 = 0$, which could be argued is the logical thing that should happen.

The escalc() and rma() functions offer the possibility to transform raw proportions and incidence rates with the Freeman-Tukey transformation (Freeman & Tukey, 1950). For proportions, this is also sometimes called the 'Freeman-Tukey double arcsine transformation'.

For proportions, the transformation (measure="PFT") is computed with the equation $y_i = 1/2 \times (\mbox{asin}(\sqrt{x_i/(n_i+1)}) + \mbox{asin}(\sqrt{(x_i+1)/(n_i+1)}))$, where $x_i$ denotes the number of individuals experiencing the event of interest and $n_i$ denotes the total number of individuals (i.e., sample size). The variance of $y_i$ is then computed with $v_i = 1/(4n_i + 2)$.

For incidence rates, the transformation (measure="IRFT") is computed with the equation $y_i = 1/2 \times (\sqrt{x_i/t_i} + \sqrt{(x_i+1)/t_i})$, where $x_i$ denotes the total number of events that occurred and $t_i$ denotes the total person-time at risk. The variance of $y_i$ is then computed with $v_i = 1/(4t_i)$.

One can also find definitions of these transformations without the multiplicative constant $1/2$ (the equations for the variance should then be multiplied by $4$). Since the $1/2$ is just a constant, it does not matter which definition one uses (as long as one uses the correct equation for the sampling variance). The metafor package uses the definitions given above, so that values obtained from the arcsine square-root (angular) transformation (measure="PAS") and from the Freeman-Tukey double arcsine transformation (measure="PFT") are approximately of the same magnitude (without the $1/2$ multiplier, PFT values would be about twice as large). The same applies to square-root transformed incidence rates (measure="IRS") and Freeman-Tukey transformed rates (measure="IRFT").

When used with the default settings, the rma.mh() function in metafor can indeed provide results that differ from those obtained with other meta-analytic software, such as the metan function in Stata, the Review Manager (RevMan) from the Cochrane Collaboration, or Comprehensive Meta-Analysis (CMA). By default, metafor does not apply any adjustments to the cell counts in studies with zero cases in either group when applying the Mantel-Haenszel method, while other software may do this automatically. For more details, take a look at the comparison of the Mantel-Haenszel method in different software and what settings to use to make metafor provide the exact same results as other software.