Introduction to treatment effects in Stata: Part 1

This post was written jointly with David Drukker, Director of Econometrics, StataCorp.

The topic for today is the treatment-effects features in Stata.

Treatment-effects estimators estimate the causal effect of a treatment on an outcome based on observational data.

In today’s posting, we will discuss four treatment-effects estimators:

RA: Regression adjustment

IPW: Inverse probability weighting

IPWRA: Inverse probability weighting with regression adjustment

AIPW: Augmented inverse probability weighting

We’ll save the matching estimators for part 2.

We should note that nothing about treatment-effects estimators magically extracts causal relationships. As with any regression analysis of observational data, the causal interpretation must be based on a reasonable underlying scientific rationale.

Introduction

We are going to discuss treatments and outcomes.

A treatment could be a new drug and the outcome blood pressure or cholesterol levels. A treatment could be a surgical procedure and the outcome patient mobility. A treatment could be a job training program and the outcome employment or wages. A treatment could even be an ad campaign designed to increase the sales of a product.

Consider whether a mother’s smoking affects the weight of her baby at birth. Questions like this one can only be answered using observational data. Experiments would be unethical.

The problem with observational data is that the subjects choose whether to get the treatment. For example, a mother decides to smoke or not to smoke. The subjects are said to have self-selected into the treated and untreated groups.

In an ideal world, we would design an experiment to test cause-and-effect and treatment-and-outcome relationships. We would randomly assign subjects to the treated or untreated groups. Randomly assigning the treatment guarantees that the treatment is independent of the outcome, which greatly simplifies the analysis.

Causal inference requires the estimation of the unconditional means of the outcomes for each treatment level. We only observe the outcome of each subject conditional on the received treatment regardless of whether the data are observational or experimental. For experimental data, random assignment of the treatment guarantees that the treatment is independent of the outcome; so averages of the outcomes conditional on observed treatment estimate the unconditional means of interest. For observational data, we model the treatment assignment process. If our model is correct, the treatment assignment process is considered as good as random conditional on the covariates in our model.

Let’s consider an example. Figure 1 is a scatterplot of observational data similar to those used by Cattaneo (2010). The treatment variable is the mother’s smoking status during pregnancy, and the outcome is the birthweight of her baby.

The red points represent the mothers who smoked during pregnancy, while the green points represent the mothers who did not. The mothers themselves chose whether to smoke, and that complicates the analysis.

We cannot estimate the effect of smoking on birthweight by comparing the mean birthweights of babies of mothers who did and did not smoke. Why not? Look again at our graph. Older mothers tend to have heavier babies regardless of whether they smoked while pregnant. In these data, older mothers were also more likely to be smokers. Thus, mother’s age is related to both treatment status and outcome. So how should we proceed?

RA: The regression adjustment estimator

RA estimators model the outcome to account for the nonrandom treatment assignment.

We might ask, “How would the outcomes have changed had the mothers who smoked chosen not to smoke?” or “How would the outcomes have changed had the mothers who didn’t smoke chosen to smoke?”. If we knew the answers to these counterfactual questions, analysis would be easy: we would just subtract the observed outcomes from the counterfactual outcomes.

The counterfactual outcomes are called unobserved potential outcomes in the treatment-effects literature. Sometimes the word unobserved is dropped.

We can construct measurements of these unobserved potential outcomes, and our data might look like this:

In figure 2, the observed data are shown using solid points and the unobserved potential outcomes are shown using hollow points. The hollow red points represent the potential outcomes for the smokers had they not smoked. The hollow green points represent the potential outcomes for the nonsmokers had they smoked.

We can estimate the unobserved potential outcomes then by fitting separate linear regression models with the observed data (solid points) to the two treatment groups.

In figure 3, we have one regression line for nonsmokers (the green line) and a separate regression line for smokers (the red line).

Let’s understand what the two lines mean:

The green point on the left in figure 4, labeled Observed, is an observation for a mother who did not smoke. The point labeled E(y0) on the green regression line is the expected birthweight of the baby given the mother’s age and that she didn’t smoke. The point labeled E(y1) on the red regression line is the expected birthweight of the baby for the same mother had she smoked.

The difference between these expectations estimates the covariate-specific treatment effect for those who did not get the treatment.

Now, let’s look at the other counterfactual question.

The red point on the right in figure 4, labeled Observed in red, is an observation for a mother who smoked during pregnancy. The points on the green and red regression lines again represent the expected birthweights — the potential outcomes — of the mother’s baby under the two treatment conditions.

The difference between these expectations estimates the covariate-specific treatment effect for those who got the treatment.

Note that we estimate an average treatment effect (ATE), conditional on covariate values, for each subject. Furthermore, we estimate this effect for each subject, regardless of which treatment was actually received. Averages of these effects over all the subjects in the data estimate the ATE.

We could also use figure 4 to motivate a prediction of the outcome that each subject would obtain for each treatment level, regardless of the treatment recieved. The story is analogous to the one above. Averages of these predictions over all the subjects in the data estimate the potential-outcome means (POMs) for each treatment level.

It is reassuring that differences in the estimated POMs is the same estimate of the ATE discussed above.

The ATE on the treated (ATET) is like the ATE, but it uses only the subjects who were observed in the treatment group. This approach to calculating treatment effects is called regression adjustment (RA).

We specify the outcome model in the first set of parentheses with the outcome variable followed by its covariates. In this example, the outcome variable is bweight and the only covariate is mage.

We specify the treatment model — simply the treatment variable — in the second set of parentheses. In this example, we specify only the treatment variable mbsmoke. We’ll talk about covariates in the next section.

The output reports that the average birthweight would be 3,132 grams if all mothers smoked and 3,409 grams if no mother smoked.

We can estimate the ATE of smoking on birthweight by subtracting the POMs: 3132.374 – 3409.435 = -277.061. Or we can reissue our teffects ra command with the ate option and get standard errors and confidence intervals:

The output reports the same ATE we calculated by hand: -277.061. The ATE is the average of the differences between the birthweights when each mother smokes and the birthweights when no mother smokes.

We can also estimate the ATET by using the teffects ra command with option atet, but we will not do so here.

IPW: The inverse probability weighting estimator

RA estimators model the outcome to account for the nonrandom treatment assignment. Some researchers prefer to model the treatment assignment process and not specify a model for the outcome.

We know that smokers tend to be older than nonsmokers in our data. We also hypothesize that mother’s age directly affects birthweight. We observed this in figure 1, which we show again below.

This figure shows that treatment assignment depends on mother’s age. We would like to have a method of adjusting for this dependence. In particular, we wish we had more upper-age green points and lower-age red points. If we did, the mean birthweight for each group would change. We don’t know how that would affect the difference in means, but we do know it would be a better estimate of the difference.

To achieve a similar result, we are going to weight smokers in the lower-age range and nonsmokers in the upper-age range more heavily, and weight smokers in the upper-age range and nonsmokers in the lower-age range less heavily.

We will fit a probit or logit model of the form

Pr(woman smokes) = F(a + b*age)

teffects uses logit by default, but we will specify the probit option for illustration.

Once we have fit that model, we can obtain the prediction Pr(woman smokes) for each observation in the data; we’ll call this pi. Then, in making our POMs calculations — which is just a mean calculation — we will use those probabilities to weight the observations. We will weight observations on smokers by 1/pi so that weights will be large when the probability of being a smoker is small. We will weight observations on nonsmokers by 1/(1-pi) so that weights will be large when the probability of being a nonsmoker is small.

That results in the following graph replacing figure 1:

In figure 5, larger circles indicate larger weights.

To estimate the POMs with this IPW estimator, we can type

. teffects ipw (bweight) (mbsmoke mage, probit), pomeans

The first set of parentheses specifies the outcome model, which is simply the outcome variable in this case; there are no covariates. The second set of parentheses specifies the treatment model, which includes the outcome variable (mbsmoke) followed by covariates (in this case, just mage) and the kind of model (probit).

Our output reports that the average birthweight would be 3,133 grams if all the mothers smoked and 3,409 grams if none of the mothers smoked.

This time, the ATE is -275.5, and if we typed

. teffects ipw (bweight) (mbsmoke mage, probit), ate
(Output omitted)

we would learn that the standard error is 22.68 and the 95% confidence interval is [-319.9,231.0].

Just as with teffects ra, if we wanted ATET, we could specify the teffects ipw command with the atet option.

IPWRA: The IPW with regression adjustment estimator

RA estimators model the outcome to account for the nonrandom treatment assignment. IPW estimators model the treatment to account for the nonrandom treatment assignment. IPWRA estimators model both the outcome and the treatment to account for the nonrandom treatment assignment.

The covariates in the outcome model and the treatment model do not have to be the same, and they often are not because the variables that influence a subject’s selection of treatment group are often different from the variables associated with the outcome. The IPWRA estimator has the double-robust property, which means that the estimates of the effects will be consistent if either the treatment model or the outcome model — but not both — are misspecified.

Let’s consider a situation with more complex outcome and treatment models but still using our low-birthweight data.

The outcome model will include

mage: the mother’s age

prenatal1: an indicator for prenatal visit during the first trimester

mmarried: an indicator for marital status of the mother

fbaby: an indicator for being first born

The treatment model will include

all the covariates of the outcome model

mage^2

medu: years of maternal education

We will also specify the aequations option to report the coefficients of the outcome and treatment models.

The POmeans section of the output displays the POMs for the two treatment groups. The ATE is now calculated to be 3173.369 – 3403.336 = -229.967.

The OME0 and OME1 sections display the RA coefficients for the untreated and treated groups, respectively.

The TME1 section of the output displays the coefficients for the probit treatment model.

Just as in the two previous cases, if we wanted the ATE with standard errors, etc., we would specify the ate option. If we wanted ATET, we would specify the atet option.

AIPW: The augmented IPW estimator

IPWRA estimators model both the outcome and the treatment to account for the nonrandom treatment assignment. So do AIPW estimators.

The AIPW estimator adds a bias-correction term to the IPW estimator. If the treatment model is correctly specified, the bias-correction term is 0 and the model is reduced to the IPW estimator. If the treatment model is misspecified but the outcome model is correctly specified, the bias-correction term corrects the estimator. Thus, the bias-correction term gives the AIPW estimator the same double-robust property as the IPWRA estimator.

The syntax and output for the AIPW estimator is almost identical to that for the IPWRA estimator.

The example above used a continuous outcome: birthweight. teffects can also be used with binary, count, and nonnegative continuous outcomes.

The estimators also allow multiple treatment categories.

An entire manual is devoted to the treatment-effects features in Stata 13, and it includes a basic introduction, advanced discussion, and worked examples. If you would like to learn more, you can download the [TE] Treatment-effects Reference Manual from the Stata website.

Thanks for this post. Cpuld you also post the syntax to create the example figures? Possibly on the Stata list.

Aaron Mamula

I realize this post is over a year old but it seems like it might need some editing. I have loaded the data from cattaneo2.dta and ran a basic logistic regression to estimate the probability of observing mbsmoke==’smoker’ conditional on age and found the coefficient on mbage to be significantly negative. This estimated results agree with a basic visual inspection of the data. Older women in this data set appear less likely to smoke during pregnancy. This observation is at odds with what is stated in the blog post (namely that problems arise when estimating the impact of smoking on birthweight because older women tended to give birth ot heavier babies regardless of the smoker/non smoker characteristic and that older mothers were more likely to smoke during pregnancy). I realize that this claim was made for illustrative purposes but it is very confusing to try and follow along with a post that 1) suggests we load the cattaneo2.dta data set then 2) proceeds to claim the existence of patterns that do not appear in that data set.