The workshop will have three sections. The first will present basic concepts of causal inference and the challenge of assessing causal effects from data. It will emphasize the close and fundamental connection to the development of randomized experiments. The presentation will trace the flow of ideas that began in the 1970’s to current work, including methods that rely on powerful computing methods.

Part two will look at statistical analysis of observational data, with causal inference linked to the idea of embedding the data within a hypothetical randomized experiment. This framework is essential for the validity of frequentist summary statements, such as p-values and confidence intervals. This multistage effort includes thought-provoking tasks, especially in the first stage, which is purely conceptual. Other stages may often rely on modern computing to implement efficiently, but the first stage demands careful scientific argumentation to make the embedding plausible to thoughtful readers of the proffered statistical analysis. Somewhat paradoxically, the conceptual tasks, which are usually omitted in publications, often would be the most interesting to consumers of the analyses. These points will be illustrated using the analysis of an observational data set addressing the causal effects of parental smoking on their children’s lung function. This presentation may appear provocative, but it is intended to encourage applied researchers, especially those working on problems with policy implications, to focus on important conceptual issues rather than on minor technical ones.

Part three is built around the analysis of a randomized controlled trial to assess the effects of a job-training program on employment and wages. It will address analysis in the presence of three post-treatment complications: missing outcome data; non-compliance with assigned treatment; partially defined outcomes (here, hourly wage for the unemployed = 0/0). The latter two issues are of substantive importance, whereas the first is a nuisance. The analysis exploits mixture models, direct likelihood methods and the EM algorithm. It uses, and checks robustness to, the “missing at random” assumption. It uses “principal stratification” to address the complications, which is a giant generalization of the method of instrumental variables used by econometricians.