Transcription

1 Explaining Causal Findings Without Bias: Detecting and Assessing Direct Effects * Avidit Acharya, Matthew Blackwell, and Maya Sen April 9, 2015 Abstract Scholars trying to establish causal relationships frequently control for variables on the purported causal pathway, checking whether the original treatment effect then disappears. Unfortunately, a large literature shows that this common approach may introduce serious bias. In this paper, we show that researchers can avoid this bias by focusing on an intuitive quantity of interest, the controlled direct effect. Under certain conditions, the controlled direct effect can allow scholars to rule out competing explanations, an important objective for political scientists. To estimate the controlled direct effect without bias, we describe an easy-to-implement estimation strategy from the biostatistics literature. We further derive a consistent variance estimator and demonstrate how to conduct a sensitivity analysis. Two examples one on ethnic fractionalization s effect on civil war and one historical plough use s impact on contemporary female political participation illustrate the framework and methodology. * Thanks to Adam Cohon, Justin Esarey, Adam Glynn, Robin Harding, Gary King, Macartan Humphreys, Kosuke Imai, Bethany Lacina, Jacob Montgomery, Alexandra Pagano, Judea Pearl, Dustin Tingley, Teppei Yamamoto, and conference or workshop participants at Harvard, Princeton, the Midwest Political Science Association meeting, and the Society for Political Methodology meeting for helpful discussions and comments. Any remaining errors are our own. Comments and suggestions welcome. Assistant Professor of Political Science, Stanford University. web: Assistant Professor of Government, Harvard University. web: Assistant Professor of Public Policy, Harvard University. web:

2 1 Introduction Rigorous exploration of causal effects has become a key part of social science inquiry. No longer is it sufficient for researchers to claim a causal finding without additional theorizing and evidence of how the effect came to be. For many scholars, this quest usually involves ruling out possible explanations. For example, does colonial status affect development even among countries with similar kinds of institutions? Does the incumbency advantage in American politics operate even when challengers have the same level of quality? Does a country s natural resource wealth affect democracy even among countries with identical levels of state repression? These important debates illustrate that disparate literatures share a similar concern: does a causal finding remain even after controlling for factors that are realized after, and possibly due to, the treatment in question? Scholars interested in estimating these kinds of direct effects often simply add variables realized after the treatment in regression models, checking to see whether the effect continues to persist. However, statisticians have long warned that this common practice can introduce serious bias. Furthermore, a large literature has shown that, when causal effects can vary from unit to unit, the identification and estimation of direct effects is more complicated than the traditional path analysis approach would imply (Robins and Greenland, 1992). This leaves researchers in a quandary: how can they effectively rule out other, potentially competing explanations for their findings without introducing serious bias or making misleading inferences? In this paper, we show that researchers can still estimate direct effects free of bias in a wide variety empirical settings. To do so, we focus on an intuitive quantity of interest: the controlled direct effect (CDE). The CDE represents the the causal effect of a treatment when the mediator is fixed at a particular level. This allows researchers to rule out whether rival explanations or theories are the exclusive drivers of their findings, often an important goal for social scientists. In addition, the CDE is the quantity of interest identified from an experimental design where both the treatment and mediator are set to particular levels. Designs of this variety, such as

3 traditional factorial designs and conjoint analyses, are playing an increasing role in our understanding of politics. While this makes the CDE well suited to answering policy and program evaluation questions, we also show below that it can also speak to causal mechanisms under certain assumptions. Thus, the CDE is an important part of the applied researcher s causal toolkit. Unfortunately, a problem associated with the estimation of direct effects, controlled or otherwise, is what we call intermediate variable bias, which is attributable to intermediate confounders or variables that are affected by the treatment and affect both the mediator and outcome. To estimate the controlled direct effect without such bias, we introduce a method from the biostatistics literature that addresses these challenges and is well-suited for continuous treatments and mediators. The method is implemented by way of a simple two-stage regression estimator, the sequential g-estimator (Vansteelandt, 2009; Joffe and Greene, 2009). This approach transforms (or demediates) the dependent variable by removing from it the effect of the mediator and estimates the effect of the treatment on this demediated outcome. Under certain assumptions, this is a consistent estimate of the controlled direct effect. To round out the method, we further derive a consistent variance estimator and describe how to conduct a sensitivity analysis for the key identification assumption. The methodology is easy to use, intuitive, and straightforward to implement with existing statistical software. We note that the direct effect is part of a growing family of causal quantities that center on questions of direct and indirect effects. Another quantity, the natural direct effect (NDE), has been widely studied in both statistics (Pearl, 2001) and political science (Imai et al., 2011). The CDE and the NDE answer similar, yet distinct causal questions and each has different properties. For instance, total causal effect (the average treatment effect) can be decomposed into two quantities: the natural indirect effect and the natural direct effect, making the NDE useful for evaluating causal mechanisms. As we show below the decomposition of the total effect using the controlled direct effect is more complicated. Below we discuss these differences and highlight how the CDE and NDE approaches complement one another, not- 2

4 ing when each has advantages in applied work. Together, these quantities provide a robust way for researchers to investigate and explain their causal findings. We proceed as follows. In Section 2, we explain direct effects and controlled direct effects, with Section 2.5 showing how controlled direct effects speak to causal mechanisms. Section 3 addresses the problems that result from the inclusion of intermediate confounders, while Section 4 presents sequential g-estimation (with our sensitivity analysis in Section 4.4). Section 5 illustrates both the framework and method via two empirical examples, which we use throughout for illustrative purposes. First, we replicate Fearon and Laitin (2003), to explore whether ethnic fractionalization affects civil war onset primarily exclusively via its impact on political instability. Using sequential g-estimation, we show that ethnic fractionalization has separate effect on civil war onset that does not operate through political instability. Second, we replicate Alesina, Giuliano and Nunn (2013), who show that historical plough use has an effect on modern-day female political participation. This effect exists only after conditioning on current-day income levels, which could introduce bias. We use sequential g-estimation to estimate the direct effect of the plough not through income, finding an effect that is even stronger than in their original analyses. Section 6 addresses differences between the CDE and the NDE. We briefly conclude in Section 7 and provide additional technical material, including a discussion of our consistent variance estimator and replication R code, in the Appendix. 2 What Are Direct Effects? Informally, a direct effect of a treatment is the effect of the treatment for a fixed value of the mediator (though how one fixes the mediator matters a great deal). Our goal in this paper is to estimate the controlled direct effect (CDE) of the treatment, which is the direct effect of the treatment when a mediator is the same fixed 3

5 value for all units. 1 An indirect effect, on the other hand, is the portion of the total effect of treatment due to the treatment s effect on the mediator and the mediator s subsequent effect on the outcome. We illustrate these concepts via a recurring example from comparative politics: Does ethnic fractionalization affect the onset of civil warsw (Fearon and Laitin, 2003)? Specifically, does ethnic fractionalization increase the probability that a country will have a civil war primarily because it leads to greater political instability? Or does ethnic fractionalization increase the probability of civil war onset independently of greater instability? Our question of interest is whether ethnic fractionalization continues to have an independent effect on civil war onset holding political instability fixed at some value for all countries. 2.1 Potential outcomes To develop the framework, we rely on the potential outcomes model of causal inference (Rubin, 1974; Holland, 1986; Neyman, 1923), a counterfactual-based framework (that we summarize only briefly here). The advantage here, in contrast to a more traditional structural equation modeling approach, is that the potential outcomes framework allows us to incorporate heterogeneous causal effects easily and decouples the definition of a causal effect from its estimation. Let A i be the treatment of interest, taking values a A, where A is the set of possible treatment values. We define M i as the mediator, taking values m M. Throughout the paper, we assume both are continuous. Using our example, A i represents ethnic fractionalization in a given country, while M i represents the mediator, the country s political instability. Studies of direct effects generally involve two sets of covariates. The first, which we call pretreatment confounders (X i ), are variables that affect the treatment and the outcome. The second, which we call intermediate confounders (Z i ), are those that 1 Note that while the informal definition of the direct effect and our description of the CDE appear very similar, alternate definitions of direct effects exist and require different assumptions for estimation, as we discuss below. 4

6 pretreatment confounders X intermediate confounders Z A M Y treatment mediator outcome Figure 1: Directed acyclic graph showing the causal relationships present in analyzing causal mechanisms. Red lines represent the controlled direct effect of the treatment not through the mediator. Unobserved errors are omitted. are a consequence of the treatment and also affect the mediator. These are intermediate because they causally come between the treatment and mediator, as shown in Figure 1. For example, a country s average income is probably an intermediate confounder because it is (1) affected by ethnic fractionalization and (2) confounds the relationship between political instability and civil conflict. As we show below, such intermediate confounders cause major problems for the estimation of direct effects. 2.2 Total effects Let Y i be the observed outcome for unit i and Y i (a) be its potential outcome, the value that unit i would take if we set (and only set) the treatment to a. For instance, if the outcome was the country-level number of battle deaths due to civil conflict, then Y i (a) would be the number of battle deaths if ethnic fractionalization was set to a. The potential outcomes connect to the observed outcome by the consistency assumption, Y i = Y i (A i ), under which we observe the potential outcome for the observed treatment level. In the potential outcomes framework, a causal effect is the difference between two potential outcomes, τ i (a, a ) = Y i (a) Y i (a ). This is the difference in outcomes if we were to switch unit i from treatment level a to a. Under the consistency 5

7 assumption, we only observe one of these potential outcomes for any unit a problem known as the fundamental problem of causal inference. To circumvent this, we typically estimate the average of treatment effects. We define the average treatment effect or average total effect (ATE, or τ) to be the difference in means between two different potential outcomes: ATE(a, a ) = E[Y i (a) Y i (a )]. (1) where E[ ] denotes the expectation over units in the population of interest. This is just the average effect if we were to change ethnic fractionalization from a to a in all countries. 2.3 Controlled direct effects To define controlled direct effects, we imagine intervening on both the treatment and the mediator at once. We define Y i (a, m) to be the value that the outcome that unit i would take if we set its treatment to a and the mediator to m. For countrylevel number of battle deaths due to civil conflict, for example, Y i (a, m) would represent battle deaths if ethnic fractionalization was set to a and political instability to m. Under the same consistency assumption, then Y i = Y i (A i, M i ). The mediator also has potential values, M i (a), defined similarly to the potential outcomes; that is, M i (a) refers to the potential value that the mediator would take on under treatment level a. Applying the consistency assumption again, we have M i = M i (A i ). Note that the potential outcome only setting a is the composition of these two potential outcomes: Y i (a) = Y i (a, M i (a)). The controlled direct effect (CDE) is the effect of changing the treatment while fixing the value of the mediator at some level m (Pearl, 2001; Robins, 2003): CDE i (a, a, m) = Y i (a, m) Y i (a, m). (2) As with total effects, it is difficult to identify individual-level effects and so we focus on the average CDE or ACDE: ACDE(a, a, m) = E[Y i (a, m) Y i (a, m)]. (3) 6

8 In other words, while a direct effect in general fixes the value of the mediator, the ACDE more closely corresponds to a ceteris paribus definition of a direct effect that is, the direct effect with the mediator fixed at some value for all units in the population. It is the effect of a direct intervention on both A i and M i for all units. In our example, this quantity of interest represents the effect of changing ethnic fractionalization if we were to fix the amount of political instability in a country at some level. The controlled direct effect is what is implicitly or explicitly estimated in experiments where units are randomized to receive more than one treatment, which are increasingly common in political science. 2.4 Natural direct and indirect effects We note other important estimands for direct effects (Robins and Greenland, 1992; Pearl, 2001), which are common in the causal mediation literature. One quantity of interest is the natural direct effect (NDE), which is the effect of changing treatment when fixing the mediator to its unit-specific potential value under a particular treatment level: NDE i (a, a ) = Y i (a, M i (a )) Y i (a, M i (a )). (4) The second value after the equality is simply the potential outcome Y i (a ) under the treatment level a. The natural direct effect represents the effect of a modified treatment that does not affect the mediator, but continues to directly affect the outcome. In our example this would be the effect of moving from complete ethnic homogeneity to some value a of ethnic fractionalization, holding political instability at what it would be under ethnic homogeneity. Note the crucial difference between the CDE and the NDE is how one fixes the value of the mediator. A related quantity of interest is the natural indirect effect (NIE), which fixes the treatment and quantifies how the outcome changes only in response to treatmentinduced changes in the mediator: NIE i (a, a ) = Y i (a, M i (a)) Y i (a, M i (a )). (5) 7

9 The first term after the equality is, again, the potential outcome under a. The natural direct and indirect effect decompose the total effect for a single unit: τ i (a, a ) = NIE i (a, a ) + NDE i (a, a ). (6) This decomposition also holds when we replace the individual-level quantities with their averages, ATE(a, a ) = ANIE(a, a ) + ANDE(a, a ), where ANIE and ANDE are the averages of the NIE and NDE, respectively. We discuss the relative advantages of the ACDE and ANDE in Section How controlled direct effects speak to causal mechanisms As we discuss below, the statistical literature on direct effects has mostly discussed the advantages of the ACDE in the context of experiments and policy evaluation. While these goals are very useful for scholars, in this section, we show how the ACDE can also speak to causal mechanisms in two distinct ways. Note that the ANDE is the more straightforward estimand for evaluating mechanisms, but as we discuss below, it is often not identified in applied settings, leaving ACDE as a scholar s only option. Thus, it is important what information the ACDE can provide about mechanisms. 1) Ruling out alternative mechanisms. First, as VanderWeele (2011) notes, if the effect of a treatment is completely mediated by a mediator M i and another set of potential mediators, W i, then a non-zero ACDE for M i implies that there must be an indirect effect that works through the set W i. 2 Thus, showing that there is a non-zero ACDE implies that the effect of treatment is not due to the M i mechanism exclusively. In Appendix B, we provide a formal proof of this result. For example, if our goal was to show that ethnic fractionalization had some effect on civil war deaths other than through political instability, we would estimate the ACDE and check its proximity to zero, taking into account uncertainty through a confidence 2 An effect is completely mediated by M i and W i if Y i (a, m, w) does not vary in a, where w denotes a fixed value for the tuple of mediators W i. 8

10 interval or hypothesis test. A non-zero ACDE would suggest that there does exist an effect of ethnic fractionalization that does not operate exclusively through political instability. This is an intuitive approach for many applied researchers, who frequently want to rule out alternative explanations as being sole determinants of their findings. 2) Support for a preferred mechanism. Second, the difference between the ATE and ACDE summarizes the role of the mediator in a causal mechanism for the effect of A i, allowing us to estimate support for a preferred mechanism. To see this, decomposing the total effect into three components is useful. VanderWeele (2014) and VanderWeele and Tchetgen Tchetgen (2014) show that, with a binary mediator M i, we can decompose the overall effect into the following: 3 τ(a, a ) = ACDE(a, a, 0) + ANIE(a, a ) + E [ M } {{ } } {{ } i (a )[CDE i (a, a, 1) CDE i (a, a, 0)] ]. } {{ } direct effect indirect effect interaction effect (7) This decomposition shows that the between the ATE and ACDE is a combination of (1) the average natural indirect effect and (2) an interaction effect that captures how much the direct effect of A i depends (causally) on M i at the individual level. (Note that this interaction effect is distinct from the more typical effect modification, which is a non-causal statement about how average effects vary as a function of potentially non-manipulated variables.) Under the following constant interactions assumption, CDE i (a, a, m) CDE i (a, a m ) = d(m m ), (8) the interaction effect in (7) simplifies to d E[M i (a )] (whether the mediator is binary or continuous) and is identified under our assumptions in Section 4.1. Imai and Yamamoto (2013) explored this assumption to compare the indirect effect of two competing mechanisms. This result also implies that we can recenter M i such 3 The decomposition can be easily generalized to the case of a continuous mediator. In that case the interaction effect is m M E[ CDE i (a, a, m) CDE i (a, a, 0) M i (a ) = m]df Mi(a )(m). 9

11 that E[M i (a )] = 0, and the total effect decomposes into the ACDE and the ANIE. Thus, under the constant interactions assumption and the identification assumptions below, the difference between the ATE and the ACDE is a measure of the indirect effect, provided we recenter M i as M i = M i E[M i (a )]. Without the constant interaction assumption, it will be infeasible to separate out the indirect effect from the interaction effect in (7). Even in this situation, the difference between the ATE and ACDE might still provide some information on how the causal finding came to be if we take a broader definition of causal mechanisms than is currently used in the statistics and biostatistics literature. Both the indirect effect and the interaction effect measure the impact of the mediator on how the treatment affects the outcome. The indirect effect measures how strong a particular causal pathway is, while the interaction effect tells us how much the mediator influences the direct effect of the treatment. The difference between the ATE and ACDE, then, will provide an aggregation of these two effects, which can be seen as a summary of how M i participates in a causal mechanism for the effect of A i on Y i. This definition of a causal mechanism, however, is more expansive than previous definitions. For example, Imai, Keele and Yamamoto (2010) defines a causal mechanisms in terms of indirect effects. Thus, the difference in their framework between the ATE and the ANDE is the relevant measure of the strength of a mechanism. Referencing (7), this is equivalent to combining the ACDE and the interaction effect to create the ANDE. VanderWeele (2009), on the other hand, defines mechanisms in terms of the sufficient cause framework and shows that while there being an indirect effect through M i implies that M i participates in a mechanism, the reverse is not true. That is, there can be variables that participate in a mechanism that are unaffected by the treatment. For applied researchers, the empirical application will determine the utility of these approaches. In some cases, indirect effects are of particular interest while in others the combination of indirect effects and interaction is sufficient. We discuss these distinctions further in Section 6. 10

12 3 Intermediate Variable Bias All direct effect quantities of interest including both the ACDE and ANDE raise the possibility of intermediate confounders. These intermediate confounders (Z i in Figure 1) in turn raise the possibility of intermediate variable bias, a type of posttreatment bias. The intuition is as follows: conditioning on a mediator results in selection bias unless all of the intermediate confounders are included as well (sometimes called M-bias), but including them means including posttreatment variables (posttreatment bias). For instance, a researcher studying whether ethnic fractionalization affects civil war net political instability should be concerned whether there exist variables measured or unmeasured that (1) are affected by ethnic fractionalization and also (2) affect political instability and civil war onset. Such variables are confounders for the mediator, but are also affected by treatment and thus are intermediate confounders. Unfortunately, dozens of such variables probably exist in our example for example, religious fractionalization, average income, racism, and so on. Most previous approaches to causal mediation confront this problem by assuming that no intermediate confounders exist. Unfortunately, this may be an unrealistic assumption for many researchers, 4 with the vast majority of mediators in the social sciences certainly violating this no intermediate confounders assumption. As Imai, Keele and Yamamoto (2010) point out, no intermediate confounders is an important limitation since assuming the absence of post-treatment confounders may not be credible in many applied settings (pg. 55). Moreover, the ANDE and the ANIE are unidentified in the presence of intermediate confounders without strong individual-level homogeneity assumptions (as we discuss in Section 4.1). This makes these quantitates of interest less attractive for most applied researchers. The ACDE is, by contrast, identified in the face of intermediate confounders (also discussed in Section 4.1). Even so, the possible presence of intermediate con- 4 Imai and Yamamoto (2013) present a method for causal mediation in the face of intermediate confounders, which they refer to as multiple causal mechanisms. This approach, however, requires these intermediate confounders to be themselves unconfounded, a similarly strong assumption. 11

13 founders does raise the possibility of intermediate variable bias. To show this, we first assume a correctly specified linear model with constant treatment effects and no omitted variables for A i. Under these assumptions, we can estimate the causal effect of A i in a regression of the outcome on the treatment and the pretreatment confounders, Y i = β 0 + β 1 A i + X T i β 2 + ε i, (9) where β 1 is the (total) effect of A i on Y i. A common way to gauge the strength of some mechanism is to include a mediator, M i, as an additional regressor in the model, Y i = β 0 + β 1 A i + β 2 M i + X T β i 3 + ε i, (10) and to interpret β 1 as a direct effect. Unfortunately, this interpretation is only correct under the assumption of no intermediate confounders. When these confounders, Z i, are present, then β 1 will not equal the ACDE nor the ANDE, even under constant effects. This is because conditioning on a posttreatment variable can induce spurious relationships between the treatment and the intermediate confounders and, thus, the outcome (Rosenbaum, 1984). To illustrate the ill effects of conditioning on a posttreatment variable, we perform a simple simulation, based on the following setup: A Z M Y Neither A i nor M i have an effect (direct or indirect) on the outcome. The mediator, M i, is correlated with the outcome through their common cause, Z i. We generate data from this process and plot the relationship between A i and both the intermediate confounder, Z i, and the outcome. Figure 2 shows the results of one such draw. Looking at the overall relationships, including the grey and black dots, we see that there is no effect of A i on Z i or Y i as expected. When we condition on M i, however, we see a much different picture. The black dots in Figure 2 are observations that 12

14 100 M i (60, 70) 100 Intermediate Confounder (Z) M i (60, 70) Outcome (Y) M i (60, 70) M i (60, 70) Treatment (A) Treatment (A) Figure 2: Simulated data that shows the dangers of conditioning on a posttreatment variable, M i. The black points and line are those when conditioning on M i being between 60 and 70 and the lighter grey points are all other points. have a value of M i in a certain range. Clearly, we see that conditioning on certain values of M i induces strong correlations between A i and both the confounder and the outcome, even though we designed the study to have no effect. This is selection bias that is induced by conditioning on M i. Indeed, as the statistical literature has long pointed out, this bias is also not conservative here we are inducing relationships where none exist. Of course in our own samples we will have no idea as to the direction of such bias. In order to avoid this bias, we might decide to include the intermediate confounders in the regression: Y i = α 0 + α 1 A i + α 2 M i + X T i α 3 + Z T i α 4 + ε i. (11) Unfortunately, here too the coefficient on the treatment, α 1, will not be equal to 13

15 the ACDE. This is because conditioning on Z i blocks the causal pathway from A Z Y from Figure 1, which is part of the controlled direct effect. Thus, both omitting and conditioning on the intermediate confounders leads to a biased estimate of the ACDE. 4 Sequential g-estimation In this section, we present sequential g-estimation, a method for estimating controlled direct effects in the face of intermediate confounders. Described by Vansteelandt (2009) and Joffe and Greene (2009), it is a type of structural nested mean model (Robins, 1994, 1997) that is tailored to estimating direct effects. We present (1) the assumptions that underlie this method, (2) basic identification results, (3) implementation details, and (4) an approach to sensitivity analysis. 4.1 Assumptions As pointed out by Robins (1997), the ACDE is nonparametrically identified under what we call sequential unconfoundedness. Assumption 1 (Sequential Unconfoundedness). {Y i (a, m), M i (a)} A i X i = x (12) Y i (a, m) M i A i = a, X i = x, Z i = z, (13) for all possible treatment values a A, mediator values m M, and covariate values x X and z Z. In addition, we assume for all the above values: P(A i = a X i = x) > 0 (14) P(M i = m A i = a, X i = x, Z i = z) > 0. (15) This assumption represents two separate no omitted variables assumptions. First, no omitted variables for the effect of treatment on the outcome, conditional on the pretreatment confounders. Second, no omitted variables for the effect of 14

16 the mediator on the outcome, conditional on the treatment, pretreatment confounders, and intermediate confounders. Thus, this represents a selection-on-theobservables assumption for each analysis. Because such assumptions can be unrealistic in observational studies, below we show how to weaken this assumption through a sensitivity analysis. Sequential unconfoundedness is not sufficient to identify the ANDE, however. To do so requires a stronger version of this assumption that omits Z i entirely (Imai, Keele and Yamamoto, 2010). In addition, identifying the ANDE requires potential outcomes from different counterfactual worlds to be independent (Imai, Keele and Yamamoto, 2010) or for individual-level no- or constant-interaction assumptions (Robins, 2003; Imai and Yamamoto, 2013). Thus, one advantage of the ACDE is that it is identified under far weaker assumptions than the ANDE. Of course, the ANDE helps partition the total effect at a finer level and calculate the strength of a given causal pathway versus all other pathways. However, this additional information comes at the cost of these additional strong assumptions. Even though ACDEs are identified under Assumption 1, the effects depend on the distribution of the intermediate confounders (Robins, 1997). To implement the simplest version of sequential g-estimation, we need the effect of the mediator on the outcome to be independent of the intermediate confounders. Assumption 2 (No Intermediate Interactions). E [ Y i (a, m) Y i (a, m ) Xi = x, A i = a, Z i = z ] = E [ Y i (a, m) Y i (a, m ) Xi = x, A i = a ], (16) for all values a A, m, m M, z Z, and x X. This assumption has several notable features. First, this assumption is not required for the nonparametric identification of the ACDE. Nonparametric identification only relies on sequential unconfoundedness and this no intermediate interactions assumption only serves to make estimation simpler. In fact, we can relax this assumption to the extent that we are willing to model the distribution of Z i conditional on A i and X i. Second, even if this assumption is false, the estimated effects 15

17 will be weighted averages of ACDEs within levels of the intermediate confounders (Vansteelandt and Joffe, 2014, pp. 718). Thus, this assumption is similar to omitting an interaction term from a regression model. Third, this assumption does not rule out important interactions between the treatment and the mediator (see, for example, (24) below) nor interactions with the baseline covariates. Finally, this assumption is weaker than other no interaction assumptions (see, e.g. Robins, 2003) that restrict the controlled direct effect at the individual level or other assumptions that rule out intermediate confounders entirely. 4.2 Identification In order to tune the estimator to estimating direct effects, we introduce the following function, which we called the demediation function 5 : γ(a, m, x) = E[Y i (a, m) Y i (a, 0) X i = x] (17) This function is the effect of switching from some level of the mediator to 0 and does not depend on the value of the intermediate confounders due to Assumption 2. We call this the demediation function because when subtracted from the observed outcome, Y i γ(a i, M i, X i ), it removes the variation in the outcome due to the causal effect of the mediator: E[Y i γ(a, M i, x) A i = a, X i = x] = E[Y i (a, 0) X i = x]. (18) This property of the demediation function follows easily from Assumptions 1 and 2 (Robins, 1994; Vansteelandt, 2009). Based on this, the ACDE conditional on X i, E[Y i (a, 0) Y i (0, 0) X i = x], is nonparametrically identified as difference in means of the demediated outcome: E[Y i γ(a, M i, x) A i = a, X i = x] E[Y i γ(0, M i, x) A i = 0, X i = x]. (19) 5 This is a specific instance of a larger class of functions that Robins (1997) calls blip-down functions. 16

18 The key intuition here is that after demediating the outcome, the remaining covariation with A i is due to the direct effect of A i. Of course this result requires us to know the demediation function, which is unrealistic. Instead, we will almost always estimate it from data. Robins (1994) showed that, under sequential unconfoundedness, the causal difference γ is nonparametrically identified from the data and is equal to the difference-in-means estimator conditional on all the previous variables: γ(a, m, x) = E[Y i A i = a, M i = m, X i = x, Z i = z] E[Y i A i = a, M i = 0, X i = x, Z i = z]. (20) This follows from the simple fact that sequential unconfoundedness implies that the effect of the mediator on the outcome is identified. Furthermore, the identification of the ACDE as (19) holds when replace γ with its estimate, γ. 4.3 Implementation When the treatment and mediator are binary or only take on a few values, nonparametric or semiparametric approaches exist to estimating the ACDE, reducing the need for parametric models (Robins, Hernán and Brumback, 2000). 6 With a continuous treatment and a continuous mediator, it is generally impossible to nonparametrically estimate the difference-in-means in equations (20) and (19) and the semiparametric approaches referenced above have poor properties (Vansteelandt, 2009). Sequential g-estimation brings in parametric models to help estimate ACDEs in this context. Sequential g-estimation proceeds in two simple steps. First, we regress the outcome on the mediator, treatment, and covariates (pretreatment and intermediate) to get an estimate of the demediation function. Second, we use the first stage to demediate the outcome and run a regression of this demediated outcome on the treatment and the pretreatment covariates. The marginal effect of the treatment in this second stage regression will be the estimate of the ACDE. We describe these steps in further detail. 6 See Blackwell (2013) for a political science application of such an approach. 17

19 4.3.1 First stage The first stage of sequential g-estimation involves estimating the effect of M i on Y i, conditional on all other variables. The components of this model that involve M i will be the parameterization of the demediation function, γ. For instance, we might use the following regression function: E[Y i A i, M i, X i, Z i ] = α 0 + α 1 A i + α 2 M i + X T i α 3 + Z T i α 4. (21) This parametric model implies a parametric model on the demediation function, which is γ(a, m, x; α) = γ(m; α 2 ) = α 2 m. (22) We could augment this baseline regression model with interactions between the mediator and the treatment or the pretreatment confounders (but not with the intermediate confounders due to Assumption 2): E[Y i A i, M i, X i, Z i ] = α 0 + α 1 A i + α 2 M i + X T i α 3 + Z T i α 4 + α 5 M i A i + α 6 M i X ik, (23) where, X ik is one variable in the matrix of pretreatment confounders. In this case, the model would imply a different demediation function: γ(a, m, x; α) = α 2 m + α 5 ma + α 6 mx k. (24) This more general demediation function allows the effect of mediator to vary by the values of the pretreatment confounders and the treatment. However we model the conditional mean of the outcome, we can obtain estimates of the parameters of the demediation function, ˆα from a least squares regression of the outcome on the treatment, mediator, pretreatment confounders, and intermediate confounders. Then, we can calculate the sample version of the demediation function, γ(a i, M i, X i ; α) = α 2 M i + α 5 M i A i + α 6 M i X ik. (25) The validity of this approach will depend on the validity of the modeling assumptions in (23). If this model for the conditional mean of the outcome is correct 18

20 and Assumption 1 holds, then the least squares estimates will be unbiased for the parameters of the demediation function. To weaken the model dependence, one could extend this methodology to handle matching on pretreatment confounders Second stage With an estimate of the demediation function in hand, we can estimate the ACDE from the second stage model. First, we demediate the outcome, Ỹ i = Y i γ(a i, M i, X i ; α), (26) which in our running example would be: Ỹ i = Y i α 2 M i α 5 M i A i α 6 M i X ik. (27) Given the results in Section 4.2, we can then estimate the ACDE of treatment by regressing this demediated outcome on the treatment and the pretreatment confounders, E[Ỹ i A i, X i ] = β 0 + β 1 A i + X T i β 2, (28) where β 1 is the ACDE. The least squares estimator β 1 will be a consistent estimate of the ACDE and avoids any intermediate variable bias by not conditioning on either M i or Z i. Note that the standard errors on β 1 from this regression will be biased due to the fact that they ignore the first-estimation of γ. In Appendix C we develop a consistent estimator for the variance of β for linear models. One can also use the nonparametric bootstrap, performing both stages of the estimation in each bootstrap replication. In simulations, these two approaches produced very similar results, though our variances estimator is far more computationally efficient. 4.4 Sensitivity Analysis To assess violations of sequential unconfoundedness, we provide a bias formula and sensitivity analysis in parametric models for the ACDE. Specifically, we derive the bias due to unmeasured intermediate confounders, which is a violation of (13) 19

Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

Chapter 5: Analysis of The National Education Longitudinal Study (NELS:88) Introduction The National Educational Longitudinal Survey (NELS:88) followed students from 8 th grade in 1988 to 10 th grade in

ORIGINAL ARTICLE Maya L. Petersen, Sandra E. Sinisi, and Mark J. van der Laan Abstract: Many common problems in epidemiologic and clinical research involve estimating the effect of an exposure on an outcome

Reply Affirmative Action s Affirmative Actions: A Reply to Sander Daniel E. Ho I am grateful to Professor Sander for his interest in my work and his willingness to pursue a valid answer to the critical

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs Andrew Gelman Guido Imbens 2 Aug 2014 Abstract It is common in regression discontinuity analysis to control for high order

What is the difference between association and causation? And why should we bother being formal about it? Rhian Daniel and Bianca De Stavola ESRC Research Methods Festival, 5th July 2012, 10.00am Association

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

The Big Picture Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, We have just completed a length series of lectures on ANOVA where we considered

A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

THE CONTRIBUTION OF ECONOMIC FREEDOM TO WORLD ECONOMIC GROWTH, 1980 99 Julio H. Cole Since 1986, a group of researchers associated with the Fraser Institute have focused on the definition and measurement

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?* Jennjou Chen and Tsui-Fang Lin Abstract With the increasing popularity of information technology in higher education, it has

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

Social Security Eligibility and the Labor Supply of Elderly Immigrants George J. Borjas Harvard University and National Bureau of Economic Research Updated for the 9th Annual Joint Conference of the Retirement

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

1 Example G Cost of construction of nuclear power plants Description of data Table G.1 gives data, reproduced by permission of the Rand Corporation, from a report (Mooz, 1978) on 32 light water reactor

FDI as a source of finance in imperfect capital markets Firm-Level Evidence from Argentina Paula Bustos CREI and Universitat Pompeu Fabra September 2007 Abstract In this paper I analyze the financing and

The Elasticity of Taxable Income: A Non-Technical Summary John Creedy The University of Melbourne Abstract This paper provides a non-technical summary of the concept of the elasticity of taxable income,

Chapter 5 Estimating Demand Functions 1 Why do you need statistics and regression analysis? Ability to read market research papers Analyze your own data in a simple way Assist you in pricing and marketing

National Tax Journal, June 2012, 65 (2), 247 282 FOREIGN TAXES AND THE GROWING SHARE OF U.S. MULTINATIONAL COMPANY INCOME ABROAD: PROFITS, NOT SALES, ARE BEING GLOBALIZED Harry Grubert The foreign share

Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

How Much Equity Does the Government Hold? Alan J. Auerbach University of California, Berkeley and NBER January 2004 This paper was presented at the 2004 Meetings of the American Economic Association. I

A Proven Approach to Stress Testing Consumer Loan Portfolios Interthinx, Inc. 2013. All rights reserved. Interthinx is a registered trademark of Verisk Analytics. No part of this publication may be reproduced,

International Review of Economics and Finance 9 (2000) 387 415 Stock market booms and real economic activity: Is this time different? Mathias Binswanger* Institute for Economics and the Environment, University

Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

Policy Discussion Briefing January 27 Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith Introduction It is rare to open a newspaper or read a government

Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.

Asymmetry and the Cost of Capital Javier García Sánchez, IAE Business School Lorenzo Preve, IAE Business School Virginia Sarria Allende, IAE Business School Abstract The expected cost of capital is a crucial

AmericasBarometer Insights: 2009(No.8) * Should Government Own Big Businesses and Industries? Views from the Americas By Margarita Corral Margarita.corral@vanderbilt.edu Vanderbilt University P rivatization

DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Do

Randomized trials versus observational studies The case of postmenopausal hormone therapy and heart disease Miguel Hernán Harvard School of Public Health www.hsph.harvard.edu/causal Joint work with James

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

Chapter 14 Dealing with confounding in the analysis In the previous chapter we discussed briefly how confounding could be dealt with at both the design stage of a study and during the analysis of the results.

Mgmt 469 Model Specification: Choosing the Right Variables for the Right Hand Side Even if you have only a handful of predictor variables to choose from, there are infinitely many ways to specify the right

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

Basic Concepts in Research and Data Analysis Introduction: A Common Language for Researchers...2 Steps to Follow When Conducting Research...3 The Research Question... 3 The Hypothesis... 4 Defining the

A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey

The Best of Both Worlds: A Hybrid Approach to Calculating Value at Risk Jacob Boudoukh 1, Matthew Richardson and Robert F. Whitelaw Stern School of Business, NYU The hybrid approach combines the two most

Gender Effects in the Alaska Juvenile Justice System Report to the Justice and Statistics Research Association by André Rosay Justice Center University of Alaska Anchorage JC 0306.05 October 2003 Gender