We have all heard the phrase “correlation does not equal causation.” What, then, does equal causation? This course aims to answer that question and more!
Over a period of 5 weeks, you will learn how causal effects are defined, what assumptions about your data and models are necessary, and how to implement and interpret some popular statistical methods. Learners will have the opportunity to apply these methods to example data in R (free statistical software environment).
At the end of the course, learners should be able to:
1. Define causal effects using potential outcomes
2. Describe the difference between association and causation
3. Express assumptions with causal graphs
4. Implement several types of causal inference methods (e.g. matching, instrumental variables, inverse probability of treatment weighting)
5. Identify which causal assumptions are necessary for each type of statistical method
So join us.... and discover for yourself why modern statistical methods for estimating causal effects are indispensable in so many fields of study!

Enseigné par

Jason A. Roy, Ph.D.

Professor of Biostatistics

Transcription

This video is on doubly robust estimation, also known as augmented inverse probability of treatment weighting. And so we're going to focus a lot on the intuition behind these kinds of estimators and talk a little bit about some of the properties. As background, let's first revisit inverse probability of treatment weighted estimation. And so here we'll think about just estimating the expected value of the potential outcome among treated subjects, meaning, if everybody was treated. So we're just going to focus on this one potential outcome just for illustration. But if you want to estimate the other potential outcome, the equation would look very similar except the weights would be different. So as a reminder, if we wanted to estimate this mean of the potential outcomes if everybody had been treated, we could do it as follows. Where, first, I want to note that the denominator involves the propensity score, so we're looking at treated subjects. And so our denominator is going to involve the propensity score because we weight by one over the probability of that group's treatment. And that group happens to be the treated group. And I wrote in as a function of X. So pi(X), just to sort of reiterate or remind you of the idea that the propensity score does depend on the X's. A, here, is just an indicator variable. It's just treatment, and we're thinking of it as binary yes or no. So A equals 1 if treatment is 0 otherwise. And so putting that right next to the Y is just sort of guaranteeing that's going to only include values in the sum among treated people. So control people are going to have A equals 0, and so their values aren't going to get counted. And remember, we're trying to estimate the mean of Y, if everybody have been treated. So we sum over all n subjects, but we pick off only the treated. But then we weight by inverse of the propensity score. So we are just, essentially, you could think of this as a sample mean of Y in the pseudo population. So this weighted population but there's no confounding. And so that's the standard sort of IPTW estimator for the mean of the potential outcome for treatment. And so if the propensity score is correctly specified, then this estimator is unbiased. So correctly specifying means we got this model right so that the true probability of treatment, given X, is actually equal to pi(X). Okay, so that's what we mean by correctly specified. If we get the model right, then this is an unbiased estimator. Now let's imagine a different approach to estimating this mean of the potential outcome. So again, we're going to just focus on estimating the mean of this one potential outcome, but the same kind of idea would apply if you were going to try to estimate the other one. So you could use some kind of outcome regression model. So we haven't yet actually done that in this course, but this is something you could do. And so what we'll do is we'll specify a model that we'll call m1(X). The 1 here is just indicating that it's among treated subjects. So it's an outcome model restricting to the subset of patients who were actually treated. So you'll notice that this model, it looks like a standard kind of regression model. It's expected value of Y, conditional on A = 1, so among treated subjects, and also your confounders, X. So this is just some model, right, it's some model for the mean given A and X. But if we actually wanted the mean of the potential outcomes, what you would have to do is you would have to take this conditional mean m1(X), and basically, average over the distribution of the confounders. You'd have to kind of integrate off the X's. Or in this case, what we're doing is we're averaging over an empirical distribution, but it can give you some intuition. So here's an estimator, the expected value of Y1 but I'll give you some intuition. So this first part of this is, if you are in the treated group, then I'd say well okay, use your value of Y. So we're taking a sum over n subjects, we're going to take an average, so we're dividing by n. Well, if you're in the treated group we're just going to use your value of Y. Because remember, by our consistency assumption, if you're treated, Y is actually equal to your Y1, to the potential outcome. So if you're A = 1, we're just going to use your value of Y. However, if you were in the control group and this 1- A here is going to pick off those control group members, right? Because if they're in the control group, A is equal to 0, which means 1- A will equal to 1, will be equal to 1. So it's identifying those individuals. And then what we have here is the regression kind of model, or this mean model for Y conditional on X. But remember, what I have here is m1(X). So these are people who where not actually treated, right? Because I've picked them off here. So these are people who weren't actually treated. But now I'm going to apply this regression model from those who were treated to this other population. What I'm trying to do, essentially, is predict what their value of Y would have been had they been in the treatment group. So that's what this is doing. So if you combine these two, what we're doing is were either, if you're in the treatment group, we use your value of Y. If you're in the control group, we use the value of Y that we think you would have had. In other words, our best guess at it, the mean from this model, if you had actually been treated contrary to fact. Add those up, divide by n, that's a valid estimate of the mean potential outcome as long as you have unconfoundedness. Given X, you've controlled for confounding. So if the outcome model is correctly specified, this is an unbiased estimator. The outcome model being correctly specified, again, means that the expected value of Y given A = 1 and X is actually equal to whatever our model is m1(X). So we have some model there, maybe it's a regression model or something. We have to get that model right, but if we do, then this is a valid estimator. So we've seen two different ways that you could estimate this mean potential outcome. So one using kind of like a regression model, but then where you actually average over the distribution of the X's or this inverse probability of treatment weighting. Doubly robust estimators are going to, essentially, try to use both of those. And I should note that the next couple of slides will be a little more technical than most of the rest of the material in this course. But we're going to try and get the main ideas across, since these kinds of estimators are becoming more popular. So the goal is really to sort of introduce the main ideas, try and sort of get at understanding the concepts. It will make it easier than to sort of, if you want to implement these in practice, just take the next step and learn more about it. So a doubly robust estimator is an estimator that would be unbiased if either this propensity score model is correct or the outcome regression model is correct. But you don't actually have to get both of them right. So you can get one wrong, and it doesn't matter which one as long as one of them is right. So I'll show you one example of a doubly robust estimator. And what we'll see here is that there's one part that looks like the standard IPTW kind of estimator. And then there's a subpart that I'm going to call an augmentation. But this involves a regression type model here. And then there's some other stuff that we need to just kind of make it work, and I'll show you how it works in a minute. But you see there's this regression kind of part on the right-hand side, this IPTW part on the left-hand side. And if you put this all together, you'll end up with something that has this doubly robustness kind of property. And we're going to explore that in a minute and see how it works. So let's imagine first that the propensity score is correctly specified, but the outcome model is not. So our outcome model is wrong, so this m1 is wrong. And by being wrong, what we mean is that the expected value of Y given A equal 1 and X, does not equal m1(X). So m1(X) is some model. We got it wrong, and so that X, the expectation, doesn't line up with m1. But the propensity score is correctly specified, we'll assume for now. Which means that the expected value of A given X, or in other words, the probability that A equals 1 given X is equal to pi(X). And this is one version of a doubly robust estimator. So because we got the propensity score right, and what I'm going to do now is just kind of walk through the intuition. This is not a formal proof, but just trying to sort of give you the intuition as why this does actually work. So the thing that this is estimating, so we have a sum over n. What this is estimating is really the expected value of the stuff that's inside these curly brackets. So out here we have an average, a sum, then we divide by n. So that's a sample average. So as the sample size grows, that becomes an expectation. So what we really are interested in is, is the expectation on the inside there equal to the thing we want? And remember, the thing we want is the expected value of Y1. Okay, so what we want to know is the stuff inside the curly bracket, does it have expectation equal to that expected value of Y1? Well, what I'm noting first is that the expectation of this one part here. The expectation of this is equal to the propensity score. Which means that you could think of this as this whole part here as having expectation of 0. So essentially, you would expect the part that I'm putting in brackets here, that part should go away in expectation. So if you imagine you are averaging this over a large sample size, the expected value of A should be equal to pi. And so you would expect that difference, if you averaged it, to get very small and, in fact, become 0. So you expect this part on the right to go away if the propensity score models correctly specified. Let me say one more thing about that. And if that goes away what are you left with? Well you're left with this part. And if the propensity score model was right we already said that that part is a valid estimator, that's just our standard IPTW. So if we get the regression model wrong, we're still going to be fine as far as this estimator goes, because this part will go away, that'll get small, that'll be 0. And this part is a valid estimator of the expected value of Y1. So now let's kind of flip things around and say what if the propensity score model was wrong, but the outcome model was correct? So remember, we have models, which means we're sort of doing the best we can to get them right. But we don't really know if either of them is right or if both are right. So it would be nice if we had robustness, where we could get one of them wrong, potentially. So here, we're going to imagine the propensity score is wrong and the outcome model is correct. So if the propensity score model was wrong, what that means is, the expected value of A given X, is not going to equal pi(X). Okay, but if the outcome model is correct, what that means is if you take expectation of Y, conditional on X, that should equal m1(X). That's just as background. What I'm doing first then, is you'll see this equal sign here is going from this step to this step. I just rearranged some terms with some algebra, and that will just make things easier to see. So from the top line to the bottom line here, I rearranged some terms, but otherwise, it's equivalent. So now I want to look at this lower equation. So I rearranged things for a reason, because it makes it a little easier to see. So the first thing to note is that now, if the outcome model is correctly specified, then the part that I bracketed here should go to 0, right? Because the expected value of Y, conditional on X, should be m1(X). So if that difference should go to 0. Now, it's being multiplied by something, and there's something in the denominator. But those things are just going to converge to some constant essentially. That's not going to blow up or something, and make the product not go to 0. So this is not a formal proof, but it's just given on intuition. So if we got the outcome model right, the part in brackets there should go to 0. So what are you left with then, you're just left with this part. And this part here is just an average of the regression model. Overall in subjects, so this is just the expected value of Y given a equal 1 and X averaged over the distribution of X. Marginalizing out X, that is just what the expected value of Y1 is. So if we get the regression model right, then this estimator should be fine as well. So as I said, this part goes to expected value of Y1. So these kinds of estimators are also known as augmented IPTW estimators. So it's a kind of standard inverse probability of treatment weighted approach with this augmentation kind of approach that typically does involve an outcome model. So you might see these kinds of terms augmented IPTW or AIPTW, and the estimator I just showed you is an example of one of them. And there are a lot of these kinds of estimators. So a lot of this comes from semiparametric theory, and you can use that to identify the best kinds of these estimators, meaning most efficient. So there's theory that says what the sort of most efficient versions of these would be. So that's beyond the sort of scope of this video, but just to make you aware. And in general, besides having this doubly robust property, which is obviously a nice property because you get to specify two models and only have to get one right. They also tend to be more efficient than regular IPTW estimators. So they give you an extra bonus, that they tend to be more efficient, meaning they have a smaller variance associated with them. So these are a little more complicated to implement in practice, but they tend to perform better.