Table of Contents

Miller (1978)

The Purpose and Data

When meta-analyzing proportions, it is usually advantageous to first transform the proportions into a measure that has better statistical properties (i.e., a sampling distribution that is closer to a normal distribution and whose sampling variance can be better approximated). An often recommended transformation in this context is the logit transformation (i.e., the log odds). While the sampling distribution of a logit transformed proportion is indeed better approximated by a normal distribution, the equation used to compute the corresponding sampling variance can still be quite inaccurate, especially when sample sizes are small. A transformation that works particularly well for normalizing and variance-stabilizing the sampling distribution of proportions is the Freeman-Tukey (double arcsine) transformation (Freeman & Tukey, 1950). The corresponding back-transformation equation was derived by Miller (1978).

The data used by Miller (1978) to illustrate the transformation and its inversion can be re-created with:

The yi values are the Freeman-Tukey (double arcsine) transformed proportions, while the vi values are the corresponding sampling variances.

Note that one can find two different definitions of the Freeman-Tukey transformation in the literature that differ only by the multiplicative constant $1/2$. The escalc() function includes the multiplicative constant $1/2$, while Miller (1978) leaves this out. Therefore, the transformed values given in the table by Miller are twice as large as the ones given above. Whether one includes the multiplicative constant or not is irrelevant, as long as one uses the correct equation for the sampling variance. For more details, see the question How is the Freeman-Tukey transformation of proportions and incidence rates computed? under the FAQ section.

Back-Transformation of Individual Values

We can check whether the back-transformation works for individual values with:

transf.ipft(dat$yi, dat$ni)

[1] 0.2727273 0.3529412 0.4761905 0.1666667

Those are indeed the individual proportions. Note that, due to the nature of the Freeman-Tukey transformation, the back-transformation requires information about the sample sizes of the individual studies. The relevance of this will become apparent in a moment.

Meta-Analysis of Transformed Values

As described by Miller (1978), we can aggregate the transformed values, either by computing an unweighted or a weighted mean (with inverse-variance weights). The unweighted mean can be obtained with:

Therefore, the estimated true proportion based on the 4 studies is $.32$ (with 95% CI: $.18$ to $.47$).

Since the true proportions appear to be homogeneous (e.g., $Q(3) = 2.18$, $p=.54$), a more efficient estimate of the true proportion can be obtained by using inverse-variance weights. For this, we first synthesize the transformed values with:

Therefore, the estimated true proportion is now equal to $.36$ (with 95% CI: $.23$ to $.50$).

Proportions Equal to 0 or 1

When the event of interest is very rare or very common, the dataset may include proportions that are equal to 0 or 1. Such data can yield difficulties when meta-analyzing raw proportions (i.e., the sampling variance of such proportions will then be equal to 0) or when meta-analyzing logit transformed proportions (i.e., the log odds will then be either equal to minus or plus infinity). In practice, such cases are handled by adding a small constant (typically ½) to the data. This approach may be acceptable when there are only a few such cases, but when meta-analyzing rare or common outcomes (where proportions equal to 0 or 1 are common), it is better to switch to a transformation that handles such proportions more gracefully, such as the Freeman-Tukey transformation. It does not require making any adjustments to the observed data, even when there are proportions equal to 0 or 1. As an example:

then the resulting forest plot also uses the harmonic mean of the sample sizes for the back-transformation of the individual transformed proportions, which is not correct (resulting forest plot not shown).

We therefore need to first obtain the CI bounds of the individual studies with:

Now the back-transformation is applied to each transformed proportion with the study-specific sample sizes. The yi values are now the back-transformed values (i.e., the raw proportions) and the ci.lb and ci.ub values are the back-transformed 95% CI bounds.1)

Finally, we can create the forest plot by directly passing the observed outcomes (i.e., proportions) and the CI bounds to the function. Then the back-transformed average with the corresponding CI bounds obtained earlier can be added to the plot with the addpoly() function. We add a couple tweaks to make the final forest plot look nice: