\newcommand{\indep}{\mbox{$\perp\!\!\!\perp$}}
```{r global-options, include=FALSE, purl=FALSE}
knitr::opts_chunk$set(fig.width=7, fig.height=4, fig.path='figs/', fig.align='center')
```
What is selection on observables? Technically it is $$\{Y_{0i}, Y_{1i}\} \indep D_i | X_i = x$$ This means that for some value of $X_i$, which we denote as $x$, we have a randomized experiment.
Let's look at this via a simulted example. Let's pretend that we can see all of the potential outcomes:
```{r}
## Create data and some "invisible variable" x
n 5
prTreat 5, 0.8, 0.3)
plot(x, prTreat)
d 5$ and when $x \leq 5$. Then we know that $$\{Y_{0i}, Y_{1i}\} \indep D_i | X_i = x$$ is true; of course in reality you wouldn't know the true data generating process, but you would have to guess and defend it.
Now, we can use the estimators Chad talked about in the SOO section. For now, let's use the subclassification estimator, $$\sum_{j=1}^M \{\overline{Y_{1j}} -\overline{Y_{0j}}\} \frac{n_j}{n}$$ where $\overline{Y_{dj}}$ is the average outcome for observations in group $j$ with treatment status $d$, and $n_j$ is the number of observations in group $j$.
```{r}
dim_subclass 5]) - mean(y[d==0 & x > 5])) * (sum(x > 5) / n) +
(mean(y[d==1 & x <= 5]) - mean(y[d==0 & x <= 5])) * (sum(x <= 5) / n)
c("ATE" = ate, "DIM Subclass" = dim_subclass)
```
Pretty good! So conditioning on $X$ allows us to recover the ATE because selection on observables holds here. Let's see that explicitly. Note, we can only do this because we are simulating an example where we see all potential outcomes. Unconditionally, is there a relationship between the potential outcomes and the treatment?
```{r}
cor(y0, d)
cor(y1, d)
```
It looks like `y0` is not independent of $d$. What about if we look at conditional independence?
```{r}
cor(y0[x > 5], d[x > 5])
cor(y0[x <= 5], d[x <= 5])
```
Of course, we know they are uncorrelated because the probability of treatment is constant in those subgroups.
What if our probability of treatment directly corresponded to our $x$ variable? For example:
```{r}
## Higher probability of treatment as x increases
prTreat 5
prTreat 5, 0.8, 0.2)
plot(x, prTreat)
d