Bounds and sensitivity analysis when estimating average treatment effects with imputation and double robust estimators

Genbäck, Minna

Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.

de Luna, Xavier

Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.

(English)Manuscript (preprint) (Other academic)

Abstract [en]

When estimating average causal effects of treatments with observational data, scientists often rely on the assumption of unconfoundedness. We propose a sensitivity analysis for imputation estimators and doubly robust estimators, based on bounds (defining an identification interval) for the causal effect of interest, which allow for unobserved confounders. The bounds are derived from the bias of the estimators, expressed as a function of a sensitivity parameter. We describe how such bounds can take into account sampling variation, thereby yielding an uncertainty interval. We are also able to contrast the size of potential bias due to violation of the unconfoundedness assumption, to the misspecification of the models used to explain outcome with the observed covariates. While the latter bias can in principle be made arbitrarily small with increasing sample size (by increasing the flexibility of the models used), the bias due to unobserved confounding does not disappear with increasing sample size. Through numerical experiments we illustrate the relative size of the biases due to unobserved confounders and model misspecification, as well as the empirical coverage of the uncertainty interval on which the proposed sensitivity analysis is based.

In thesis

Genbäck, Minna

Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.

2016 (English)Doctoral thesis, comprehensive summary (Other academic)

Abstract [en]

In this thesis we develop methods for dealing with missing data in a univariate response variable when estimating regression parameters. Missing outcome data is a problem in a number of applications, one of which is follow-up studies. In follow-up studies data is collected at two (or more) occasions, and it is common that only some of the initial participants return at the second occasion. This is the case in Paper II, where we investigate predictors of decline in self reported health in older populations in Sweden, the Netherlands and Italy. In that study, around 50% of the study participants drop out. It is common that researchers rely on the assumption that the missingness is independent of the outcome given some observed covariates. This assumption is called data missing at random (MAR) or ignorable missingness mechanism. However, MAR cannot be tested from the data, and if it does not hold, the estimators based on this assumption are biased. In the study of Paper II, we suspect that some of the individuals drop out due to bad health. If this is the case the data is not MAR. One alternative to MAR, which we pursue, is to incorporate the uncertainty due to missing data into interval estimates instead of point estimates and uncertainty intervals instead of confidence intervals. An uncertainty interval is the analog of a confidence interval but wider due to a relaxation of assumptions on the missing data. These intervals can be used to visualize the consequences deviations from MAR have on the conclusions of the study. That is, they can be used to perform a sensitivity analysis of MAR.

The thesis covers different types of linear regression. In Paper I and III we have a continuous outcome, in Paper II a binary outcome, and in Paper IV we allow for mixed effects with a continuous outcome. In Paper III we estimate the effect of a treatment, which can be seen as an example of missing outcome data.