What struck me about the discussion is that the future is now. According to a recent article in the Washington Post, the FDA has been lobbied to approve a new drug for Duchenne muscular dystrophy based on a twelve person study. According to the article, the drug may help about 2,000 boys alive in the US today and approximately 1 in 30,000 boys born.

How large the trial needs to be to satisfy the FDA depends on how effective the drug is, but if say 500 boys are needed, the trial would have to accrue at least 25% of the disease population.

This may the future. Drug's designed to target very very specific disease populations will necessary aim at very small disease populations. Disease populations that are so small that it may be impossible to design large enough studies to test the drug's effectiveness. How does the FDA approve such drugs?

1. The spike was due to environmental factors such as the arsenic in the drinking water or the exposure to the heavy metal tungsten.

2. The spike was related to the fact that the town had a large number of Navy personnel or some other set of unknown characteristics of the town's population.

3. The spike was a statistical fluke.

If we are interested in determining whether the leukemia spike was due to environmental factors then we can think of the relationship between the environment and leukemia as a "causal" relationship or a "spurious" relationship.

Let X represents the environment of Fallon, Y the number of leukemia cases, and U some unobserved cause of both a family's location in Fallon and leukemia. It could be that X is directly determining Y or that U is determining both X and Y.

Causal relationship

Spurious relationship

How could we distinguish between the two possibilities? In both cases we will see that a families location choice and families likelihood of having a child with leukemia are correlated.

Judea Pearl argues we should conduct an experiment. That is we should introduce a policy to purposely change X. If we move families out of Fallon or assign them to other locations and see a reduction in leukemia cases among families not located in Fallon, then we know that Fallon is the cause of the spike. If we don't see any change in the likelihood that children in these families get leukemia then we know the relationship between Fallon's environment and leukemia cases is spurious. That is, we can rule out (1) and know it may be due to some other cause (2) or a statistical fluke (3).

What if there is both a causal relationship and a spurious relationship? That is, what if there is something in the environment of Fallon that is leading to increases in leukemia, but the magnitude and direction of the effect is being mediated by some unobserved characteristic such as a family's propensity to be in the Navy. In this case Pearl's experiment still determines whether there is a directed arrow from X to Y, but we learn nothing about how that relationship is being mediated by U.

If we were able to randomly assign families to Fallon NV, then we could determine that something in Fallon's environment is increasing the likelihood of a child in the family having leukemia. What we don't learn is whether there are other factors that either mitigate or propagate the effect of Fallon's environment on the propensity to get childhood leukemia.

Pearl's experiment allows us to determine whether the relationship is causal or spurious. It does not provide information on the appropriate policy response to the problem.

Tuesday, April 22, 2014

You will never see the above heading in a newspaper article on a new break through drug. This may be unfortunate, because it one of the more accurate headlines you will read.

In a comment to this post, Bill provides an example of a drug that decreases survival for 99% of patients by 1 month and increases survival for 1% of patients by 200 months.

If you punch that into Google you get that the average treatment effect (Rubin's typical causal effect) is an increase in survival of 1 month (approximately). So while 99% of patients are made worse off by the drug, the 1% of patients do so well that they pull up the average so that Bill's "crappy" drug comes out smelling like roses.

While Bill's example is an extreme, the point is more general. The average treatment effect does not provide evidence on how the treatment will effect individual patients. If a particular treatment increases average survival by 30 percentage points for half the patients and decreases survival by 10 percentage points for the other half of patients we will find that the average increases in survival is 20 percentage points. A treatment that increases survival by 20 percentage points is a major breakthrough in cancer research.

In their analysis of 5-Fu as an adjuvant treatment for Stage III colon cancer, Moertel et al (1990) find that 5-Fu is associated with a 20 percentage point increase in the probability of survival at 4 years. The authors recommend that the treatment be provided to all patients. We see that in this post, the treatment effect for older patients is vastly different than for younger patients. Older patients are associated with a large treatment effect, while younger patients see a small or non-existent treatment effect.

Thursday, April 10, 2014

Australian soldiers playing the two-coin toss game "two up".
Two up requires balance prior to randomization.

According to a 2010 report by the National Research Council (National Academies), the "primary benefit from randomizing clinical trial participants into
treatment and control groups comes from balancing the distributions of
known and unknown characteristics among these groups prior to study
treatments."

It is a simple matter to see that this is not true and that randomization does not imply balancing.

Remember back to gym class and those times where the teams were chosen at random and how mad you got because Bobby and Sue were on the other team and you only got Steve and how unfair that was (and there is no way Jesse was as good as Sue). Or the time where the gym teacher made an effort to balance the two teams by carefully pairing people of equal ability and assigning them to different teams.

Of course the authors of the report, who are highly respected statisticians, know this. The authors are more careful in some other parts of the report - putting "probabilistically" in parenthesis before "balances" in another similar paragraph. Moreover, when the authors do discuss the technical reason for randomized control trials they cite the arguments by Rubin (discussed in an earlier post).

The problem with perpetuating the myth that balancing is implied by randomization, is that lay people and regulators may look askance at unbalanced studies, mistaking "unbalance" for "non-random". Worse, we may see reporting bias and publication bias because studies with unbalanced populations or unbalanced treatment arms are held back. Worse still, we may see (or not see) efforts to "balance" the trial through non-random assignment of patients to treatment arms.

So next time you see a well-balanced study. Beware. It may not be random and the results may be baised.

Tuesday, April 8, 2014

Figure 1: Survival probabilities at each point in time for patients that were 61 years old and over. It shows that patients who were in the 5-FU trial arm had an average survival probability that was twenty percentage points higher than patients in the other two trial arms at the 3,000 day point.

Figure 2

Figure 2: Survival probabilities at each point in time for patients that were 60 years or younger. It shows that patients in each of the three trial arms had similar survival probabilities.

These graphs were produced using R and the "colon" data in the "survival package". The patients were split into two equal groups by age.

The data is from the Moertel et al (1990) paper. While the authors discussed some subset analysis. The authors do not produce these graphs nor note the variation in survival by age.

The main results presented in the original study are discussed in this post.

Monday, April 7, 2014

Angus Deaton presents a lecture discussing various issues and concerns with the use of RCTs in economics and more broadly in other fields including epidemiology. The lecture was given in honor of John Snow.

In any longitudinal study (a study that occurs over time), like a randomized control trial analyzing cancer treatments on survival, there is going to be attrition from the study. Over time, people will leave the study for many different types of reasons. Some reasons people leave a study have no effect on statistical inference. For example, if a patient or a patient's spouse gets a work transfer to different location without access to a study center. However, there are some reasons why people leave a study that may have a large impact on statistical inference. For example, a patient may leave the study simply because they feel the treatment is not working. It is this second reason for leaving that is associated with "attrition bias."

The report makes a number of very good points. It has some relatively simple and easy to implement suggestions for how to adjust trial design to reduce or better account for attrition bias. It makes it clear that if the attrition is "non-random" then any assumptions that the researcher or statistician makes about how the data is missing cannot be tested or verified. I was also pleasantly surprised to see that the report discussed a number of ideas that have been developed in economics including "Heckman selection" models, instrumental variables, and local average treatment effects.

Even so, there were two recommendations I didn't see, but would have liked to have:

1. Present bounds. Econometrician Charles Manski and bio-statistician James Robins (in his paper on the treatment effect of AZT on AIDS patients) introduced the idea of bounding the average treatment effect when faced with variables "missing not at random" in late 1980s. It would have been nice to see this idea mentioned as a possible solution.
2. Discuss the implications. If there is concern about bias, that concern should be raised by the researchers. The researchers should discuss the implications of the results and their policy recommendations.

Wednesday, April 2, 2014

Unconfoundedness is a silly name for a thing. Particularly a thing that is so important. A thing that leads us to spend billions of dollars on randomized control trials every year, while perfectly good observational data lays forlorn and unloved in the databases of CMS, hospitals and insurance companies.

So what is this "unconfoundedness"?

Unconfoundedness is the state of not being confounded.

Obviously.

To understand unconfoundedness. It is necessary to understand confoundedness. Consider the graph to the right. We are interested in causal effect of X on Y. Where X may represent the colon cancer treatment FolFox (5-Fu and oxaliplatin) and Y represents survival of colon cancer patients. We would like to know how much of an increase in survival colon cancer patients get when they are given FolFox versus 5-Fu alone. If we observed data from Medicare patients like in this paper. We may think that FolFox have a big effect on survival. The problem is that patients are not choosing randomly between 5-Fu and FolFox. Patients and their doctors may have information about their own characteristics (U) and that information may be determining their choice of treatment (the arrow from U to X).

So it may be that when we see that patients on FolFox do much better than patients on 5-Fu, it may be that is coming from the fact that doctors and patients who are older or frailer are choosing to forgo the oxaliplatin and its associated side effects. The observed difference in outcomes may be due to the treatment or it may be due to the characteristics of the patients that are choosing each of the treatments and having nothing at all to do with the effect of the treatment itself.

Unconfounded Graph

How do we get rid of this confounding effect?

One way is to randomize patient assignment to treatment. This is what is done in randomized control trials. This act of random assignment removes the arrow from U to X (see graph to the left). Treatment choice (X) is no longer decided by unobserved patient characteristics (U).

Is randomization the only way to get unconfoundedness?

No. There are many ways for data sets that have the unconfoundedness property. The important thing is that the choice of treatment is unrelated to unobserved characteristics of the patients that may be associated with different observable outcomes. For example, prior to oxaliplatin getting FDA approval, there was very little use of the drug by colon cancer patients. Economists call observational data that satisfies unconfoundeness, "natural experiments."

Why is unconfoundedness good?

Technically, unconfoundedness allows the researcher to measure the "marginal" distribution of outcomes conditional on the treatment. The observed distribution of outcomes conditional on treatment choice is an unbiased estimate of the marginal distribution of outcomes conditional on treatment.

The study looked at the effect of the drug SRT1720 on life expectancy of mice and was recently reported in CELL. 400 mice were allocated to 4 treatment arms - standard diet, standard diet plus SRT1720, high-fat diet, and high-fat diet with SRT1720.

As I said, all the mice do the right thing and die and so we know the mean effect of SRT1720 on survival or we would if the authors had reported it. The authors do report that the average effect is "significant" for mice on both diets. I don't know if they mean it is statistically significant or medically significant. We also learn that SRT1720 is associated with an 8% increase in survival for the SD mice and a 22% increase in survival for the HD mice. We aren't told if these are statistically significantly different from zero.

While it is not discussed, we see from the picture that at about 85 weeks, the average increase in survival probability is twenty percentage points for the HD mice and ten percentage points for the SD mice. Although at 140 weeks, the average increase in the probability of survival due to the drug is approximately zero.

The fact that the curves come together at the end suggests that the drug affects different mice differently. Again this is not discussed by the authors, but we can infer from the figure that for at least 30% of mice on a high fat diet the drug increase survival (see discussion in this post). However, for some mice (that live a long time) the drug has no effect on survival.

As for diet, we see that has a very large effect. For the non-drugged mice, switching from HD to SD increases the probability of survival at 85 weeks about 40 percentage points. While at 140 weeks it increases the probability of survival by between 5 and 10 percentage points.