In Part 1 of this seven-part overview of meta-analysis, I introduced Conn, Hafdahl, Cooper, Brown, and Lusk’s (2009) quantitative review of workplace exercise interventions and discussed extracting effect-size (ES) estimates. Building on that material, in this second part I’ll address obtaining info about an ES’s sampling error, which plays a critical role in most modern meta-analytic methods. (Part 1 of this overview lists topics in the subsequent five posts.)

Task 2: Obtain Information about Effect-Size Sampling Error

Most modern meta-analytic methods use info about each ES estimate’s sampling error or uncertainty. ES estimates with less sampling error (e.g., from larger samples) tend to be nearer their ES parameters, and we’d usually like these more precise ES estimates to influence our meta-analytic results more. Also, precision of ES estimates influences the precision and power of meta-analytic results, such as estimates or tests of ES parameters’ mean or variance. Considering the simple example of an ES estimate from each of two independent samples, we’d expect combinations or comparisons of these estimates, such as their mean or difference, to be more precise when either ES estimate is more precise. Below I describe two types of info about sampling error: sample size and precision. In Part 5 of this overview I’ll describe how these are used in meta-analytic techniques.

An aside about sampling in meta-analysis: It’s often useful to view ES estimates as arising from sampling of both studies and subjects (within a study), which leads to two sources of variation, error, or uncertainty.

Studies: A study is sampled from a universe of studies in a given research domain; essentially, this sampling is due to numerous aspects of how the study is conducted, which in turn determine its ES parameter (θi). Variation of ES parameters among studies is often called between-studies or interstudy variation.

Subjects within studies: Conducting a given study with a sample of subjects yields the data used to compute an ES estimate (yi); using different subjects would yield a different ES estimate. Variation of ES estimates among samples for given study is often called within-study or intrastudy variation.

According to this view, two studies’ ES estimates differ in part because they’re from different studies and in part because they’re from different samples of subjects. (Situations where some studies contain multiple ES parameters might involve 3 or more stages of sampling.) In this section I’ll focus on within-study variation. In this overview’s fifth part, on analyzing data, I’ll say more about the core meta-analytic notion of between-studies variation. (end of aside)

Sample Size and Precision

In meta-analysis we usually express within-study variation in terms of an ES estimator’s sampling distribution: the distribution of a given study’s ES estimates from all possible subject samples of a specific size. Hence, an ES estimate’s sampling error concerns deviation of yi from E(Yi) ≈ θi. A fundamental element of an ES estimate’s sampling error is sample size, ni. Any sensible ES estimator’s variation over subject samples is smaller with a larger sample size. The statistical model underlying most meta-analytic techniques treats sample sizes as fixed, known quantities, so they’re essentially inherent characteristics of each study that don’t contribute random error and needn’t be estimated. (Let’s ignore for now the rare meta-analytic methods that treat sample sizes as random.) Some meta-analytic techniques use sample size as the main quantity representing different ES estimates’ relative sampling error, in that they weight each ES estimate by its sample size(s).

Many meta-analytic techniques quantify sampling error using an ES estimator’s precision instead of sample size.F1 Some authors collectively call these precision-weighted meta-analysis. For a univariate ES, precision is the reciprocal of the ES estimator’s sampling variance—its squared standard error (SE). (Some authors use “precision” to mean the SE’s reciprocal.) I’ll often refer to Yi‘s sampling variance as conditional variance (CV), given θi, to distinguish it from the unconditional or marginal variance when θi is also sampled. Whereas Yi‘s CV represents only within-study variation, its unconditional variance additionally represents between-studies variation. Similarly, we could distinguish conditional from unconditional precision. Here’s some relevant notation for conditional quantities (ignoring their unconditional counterparts for now):

σi2: the CV of Yi, given Study i‘s ES parameter and (known) sample size, whose definition we can write as σi2 ≡ Var(Yi | θi; ni); for generality ni can be one or more values (e.g., when the ES compares 2 or more groups)

vi: an estimate of σi2

wi: an estimate of 1 / σi2, Yi‘s (conditional) precision, such as 1 / vi; used as a weight in certain meta-analytic procedures

Distinguishing between σi2 and its estimate vi might seem fussy but can be valuable. Many meta-analytic methods entail using vi but treating it as known, like σi2. When σi2 depends on θi‘s unknown value, what we substitute for θi in the estimates vi and wi can influence subsequent meta-analytic results. For example, suppose Study i‘s ES estimate, ri, is the Pearson correlation for a sample of size ni from a bivariate-normal distribution with correlation parameter ρi: The corresponding estimator, Ri, has a sampling distribution whose CV we can approximate by

Complicating Issues

As with ES estimates, it’s difficult to make general statements about obtaining sample sizes or CVs. Below are issues that arise commonly.

Correct sample size(s): Using the appropriate sample size(s) for a meta-analytic procedure can be complicated by what’s reported in a study or by how formulas are expressed. Usually the sample size relevant to an ES estimate’s sampling error pertains to the subjects who contributed data to that estimate, but study authors sometimes report other sample sizes (e.g., at other occasions, before selection or attrition, ignoring missing data). Some formulas use a total sample size, others require sub-sample sizes (e.g., from each of 2 groups, cell or marginal frequencies in a 2 × 2 contingency table), and still others involve different sample sizes associated with different parts of an ES estimate (e.g., means and standard deviations [SDs] from different samples).

Missing data: Ideally each study would report the required sample size(s) or other results for a CV. This is true of CVs for many ESs estimated by standard methods, because often the CV requires only sample size(s) in addition to the same info needed to estimate the ES. Some CVs for ESs estimated by non-standard methods, however, require info that’s not typically reported, such as an intraclass correlation required when a between-subjects standardized mean difference (SMD) is estimated using certain results from analyses that involve a within-subject factor. Even some standard ES estimates require rarely reported results for their CVs; for instance, the CV for a within-subject SMD—such as from a repeated-measures or crossover design—requires the correlation between repeated measurements, and the CV for a within-subject risk difference (e.g., success rate before vs. after intervention) requires the numbers of switched cases (e.g., succeed-then-fail or fail-then-succeed). Also, as alluded to in the previous issue, missing data from some subjects can influence ES estimates and their CVs.

CV tied to ES estimation: To represent an ES estimate’s sampling error accurately, its associated CV should reflect how that estimate was obtained. Some authors and software packages seem to overlook this. As discussed in Part 1 of this overview, some ESs can be estimated in different ways using different reported results. For example, a SMD can be a within-subject or between-subjects comparison, each of which can be estimated using various types of reported info. Although some alternative ES computations are equivalent, in that they yield identical ES estimates for any sample of subjects, others aren’t. Non-equivalent ES estimators don’t in general have the same CV, and a given ES estimator’s sampling error can be distorted by using a non-equivalent estimator’s CV. For instance, a Pearson correlation will have a smaller CV if estimated from continuous scores than from dichotomized scores (e.g., phi coefficient, tetrachoric correlation), because dichotomization discards statistical information about the correlation parameter. A more subtle example is that bias adjustments tend to change an ES estimator’s CV. Furthermore, some ES estimators may not have a readily available CV formula, such as if the estimate is obtained by an unconventional method or a study’s data were processed or analyzed unusually.

Alternative CV formulas: For some ES estimators more than one formula for a CV is available. This is mainly because most such formulas rely on certain assumptions about subjects’ scores and involve an asymptotic approximation to the ES estimator’s sampling distribution that’s more accurate with more subjects. Different assumptions or approximations can yield different formulas. For example, to compute the CV for a Pearson correlation based on continuous scores, different authors use ni, ni − 1, or ni − 2 in the denominator, which matters more for small ni. As another example, consider a SMD between two independent groups’ means, with their pooled SD as the standardizer: If we assume subjects’ scores are normal with homogeneous variance (between groups), at least five different CV formulas have been proposed for this ES estimator (supposing we know the ES parameter), and with small samples some of these yield markedly different values; for instance, given a true SMD of 1.0 and samples of sizes 5 and 10, five different CV formulas yield 0.266, 0.295, 0.300, 0.306, and 0.361, of which the largest is probably most accurate and is 35% larger than the smallest—roughly tantamount to having 35% fewer subjects!

A multivariate ES estimate (yi) presents additional challenges besides those above. Often such an estimate’s elements are dependent, which influences certain results from meta-analytically comparing or combining two or more elements of the vector. In terms of sampling error, we can quantify this dependence by a (conditional) sampling covariance or correlation between each pair of elements.F3 To represent these covariances in notation, we can replace σi2, vi, and wi (for Yi) with their matrix counterparts for Yi: Σi, Vi, and Wi = Vi-1.

As a bivariate example, a pair of multiple-treatment SMDs, each of which compares one of two treatment groups’ means to one control group’s mean, is correlated due to the shared control group; we can express this pair’s sampling error as the conditional covariance matrix (CCM) containing each ES’s CV and their conditional covariance; how to compute these quantities depends on how the ESs are estimated (e.g., which groups’ SDs are pooled) and certain assumptions (e.g., homogeneity of variance), but the required info is usually available. By contrast, the CCM for multiple-endpoint SMDs, which arise when two groups are compared on two or more variables (e.g., outcomes, conditions, occasions), requires each group’s correlation between each pair of variables, which is rarely reported. Similar issues arise when the dependent ESs are multiple-treatment or -endpoint comparisons of proportions, more complex two-group comparisons or multiple-group contrasts, or correlation matrices. Conditional covariance formulas aren’t readily available for some multivariate ESs, such as multiple-treatment-and-endpoint comparisons (i.e., from comparing 1 control to each of 2 or more treatment groups on 2 or more endpoints) or multiple-treatment or -endpoint pre-post comparisons.

Example—Workplace Exercise: As described in Part 1 of this overview, Conn et al. (2009) estimated three types of SMD between treatment (i.e., exercise intervention) and control groups/conditions on each of several outcome variables. For each SMD estimate they (well, we) computed a CV to use in later meta-analyses, using more accurate “exact” formulas when feasible. To reduce sampling error in a given study’s CV, they replaced that study’s SMD parameter in the CV formula with a shrinkage estimate based on preliminary estimates of the ES parameters’ between-studies mean and variance—similar to an empirical Bayes posterior mean. I’ll briefly mention some issues they encountered when computing CVs for each type of SMD; technical details are beyond this overview’s scope.

Two-group posttest: For studies that required using pre-intervention SDs that were based on different sample sizes than the post-intervention means, they modified the usual formula for a CV to reflect the standardizer’s degrees of freedom. For studies that reported a success rate for each group (e.g., on a dichotomized outcome), they used a method due to Cox—instead of the usual SMD formulas—to approximate the SMD and compute its CV. They handled multiple-treatment comparisons from a given study in two ways, depending on the meta-analytic procedure: as independent SMD estimates, and as a dependent set of SMDs from samples with homogeneous variances (e.g., SD pooled over all samples as shared standardizer, appropriate conditional covariances); for multiple-treatment success rates, they obtained the conditional correlation by parametric bootstrap.

Treatment pre-post: CVs for this type of SMD require the pre-post correlation, which was not available. As a sensitivity analysis, for one set of analyses they computed CVs assuming this correlation was .0; for another set, .8. They omitted the small fraction (less than 2%) of these pre-post comparisons for which pre- and post-intervention success rates were reported, because an appropriate CV formula was not readily available. They neglected any hierarchical dependence due to multiple treatment samples per study; because these comparisons don’t involve control samples, there’s no sampling dependence (i.e., non-zero conditional covariance/correlation) between different treatment samples’ SMDs.

Two-group pre-post: Because this SMD is just the difference between two independent pre-post SMDs—for treatment and control groups—its CV is just the sum of those two SMDs’ CVs. Consequently, issues that affect pre-post SMDs (e.g., missing pre-post correlation) also influence these SMDs. They treated these SMDs as independent, which neglects both sampling dependence (due to the shared control sample) and hierarchical dependence.

Some of these issues were handled differently in the following later article, which was based on the same project but focused on physical activity as the outcome and not restricted to workplace settings:

For instance, Conn et al. (2011) obtained pre-post correlations from several authors and used these to empirically impute correlations when computing CVs for pre-post SMDs. Also, they approximated pre-post SMDs from pre- and post-intervention success rates using a difference between probits; this estimator’s CV was derived using the delta method and relied on an imputed pre-post correlation.

That wraps up this segment about ES sampling error. Again, I’ve only superficially introduced several issues relevant to this aspect of meta-analysis. I plan to delve more deeply into some of these issues after completing this seven-part overview.

Footnotes

1. The distinction between sample size and precision in meta-analysis is blurry. Methods that explicitly use sample size often implicitly involve ES estimates’ precision, such as in inferential results (e.g., tests, confidence intervals). Many methods that explicitly use precision effectively require only ES estimates and sample sizes, because often precision is computed from these.

2. A more familiar example of estimating a sampling variance involves the mean of a simple random sample, whose SE depends on the distribution’s variance parameter (i.e., “population” variance) and must be estimated when this variance is unknown. To see parallels with meta-analysis, consider a one-way ANOVA and imagine estimating the SE for one cell’s sample mean using data from other cells—especially under heterogeneity of variance.

So, as the conditional covariance σ12 increases (i.e., the correlation between Y1 and Y2 increases), the mean’s CV increases and the difference’s CV decreases. We can also express the variance of a linear combination of more than two ES estimators or the covariance between different linear combinations, such as Cov(M, D), but these are easier using matrix notation.