Confounding Factors and Experimental Design

As a general rule, when you are designing an experiment, you want to have an experimental group and a control group.

These two groups should be identical except that one group (the experimental group) should receive some form or treatment (what we call the experimental factor or experimental variable) while the other should either receive nothing or receive a placebo (depending on the exact type of study being conducted).

In other words, the two groups should be totally identical except for the experimental variable.

When that condition is met, you can then infer that differences between the two groups are being caused by the experimental variable.

But the problem is that in reality, having two identical groups is almost never possible. This is where the topic of confounding factors come in: A confounding factor is a third variable that might have a causal effect, thus preventing you from being able to assign causation.

Imagine you designed a product that is intended to prolong the life of a car. Thus, you get two groups of new cars, and pour the product (the experimental variable) into the engine of one group (the experimental group), while giving the other group an equivalent volume of regular engine oil (the control group).

Hire people to drive them around a racetrack, at high speeds, for weeks until they eventually die.

Record how long the cars last (your response variable).

Suppose that at the end of the experiment, you find that, on average, the experimental group lasted significantly longer than the control group.

Thus, you conclude that the product works. There is one problem, however. All of the experimental cars were Toyotas, and all of the control cars were Nissans.

This is what we would call a fully confounded experiment, because there is a third variable (car brand) that is also completely different between your groups.

Thus, it would be impossible to use this experiment to say that the product works, because you have no way of knowing if the product actually worked or if Toyotas simply last longer than Nissans.

Because of that third variable, there is no way to confidently assign causation.

That is obviously an extreme example, but this type of thing happens all the time in real experiments, and it can occur in very subtle ways.

Let’s say that you are testing a drug on rats, and you have your rack of cages with control rats on one side of the room and your rack of cages with the experimental rats on the other side. That may not sound like a problem, but it can be.

Imagine, for example, that there is a draft in your lab, and as a result, one rack experiences a different temperature than the other. Temperature can affect metabolism and a host of other biological processes, so that would confound your experiment.

Similarly, perhaps people walk through one half of your lab more than the other. That could stress the rats, and stress also affects many biological processes.

All of that may seem minor, but it really can make big differences in your results, and when you have a fully confounded experiment like that, you simply can’t assign causation.

My rat example is also a bit extreme, because it is, once again, a fully confounded design, meaning that the confounding factor (cage position) is totally different between the two groups; however, there are also many cases where experiments are partially confounded, and they can be just as problematic.

Let’s say, for example, that you are testing a drug, and your control group is made up of 90% men and 10% women, whereas your experimental group is made up of 20% men and 80% women. That is actually a problem, because women and men have biochemical differences and they often respond differently.

As a result, any differences that you see could be being driven by male/female differences rather than control/treatment differences.

Thus, gender is a confounding factor – and makes it very difficult to assign causation.

How do scientists deal with confounding factors?

Recall that our experimental group was entirely Toyotas and the control group was entirely Nissans. If you actually did that experiment, you would be screwed, because there is no statistical test on the planet that could tease out the effects of Toyotas vs the effects of the treatment.

Fortunately, most scientists aren’t that brainless. We think about confounding factors before doing our experiments, and try to minimize them.

For example, we could use only one brand and one model of car for both groups (thus eliminating brand as a factor). Further, we would want to control the year of the car, the factory that produced it, etc.

If you can remove a variable from your experiment, then you should do so. Remember, ideally you want your groups to be totally, 100% identical. We need to eliminate as many confounding factors as you possibly can.

Even after you have controlled every confounding factor that you can think of, there there will almost certainly still be some slight variation that you aren’t aware of. For example, there may be slight inconsistencies in the manufacturing process, the steel that was used, etc. Because you don’t know what those differences are, you can’t eliminate them, but you can compensate for them by randomizing.

Take your pool of cars and randomly select which ones go into each group. Thus, any variation gets randomly dispersed into your two groups, rather than falling disproportionately into one group. This tool should be used whenever possible.

A confounding variable that you know about, but can’t get rid of

In many cases there are practical reasons why it is impossible to get rid of confounding factors, and in other cases there are scientific reasons. For example, maybe you want to know if there is an interaction between the product and car brand (i.e., does it work better on Toyotas than Nissans?), or you may simply want your results to be as broadly applicable as possible. After all, if you only test it on Toyotas, then all that you have actually shown is that it works on Toyotas (assuming it works at all), and you are making an assumption when you apply that result to other car brands. That assumption is probably reasonable, but it would be better to actually test it.

This is where blocking and measuring your confounding factors come in. Let’s say that you want to test this product on Toyotas, Nissans, and Fords. So, you select one model and year of each brand and control for confounding factors within car brand as much as possible, just like before. Having done that, one option would be to simply pool all three car brands and randomly select your experimental cars and control cars from that. There is nothing technically wrong with that (assuming you still include car brand as a factor in your analyses, see Note 3), but it’s not the most powerful design available to you. A much better design would be to block or group your experiment.

You could, for example, have 30 of each brand in each group, in which case brand would be a blocking variable. To be clear, you still must randomize, but the randomization would take place within blocks. In other words, you would take your 60 Toyotas, and randomly assign half to each group, then you would take your Nissans and randomly assign half to each group, etc. Then, when you do your statistics at the end of the test, you would include brand as a variable in your statistical analyses, and this would be a very robust design (I won’t go into the details of why this design is so powerful here, but if you want to learn more, looking into two-factor ANOVAs is a good place to start).

You can also build on this design by including additional blocks. For example, you could have several models of car within each brand (this would then introduce yet another concept known as nesting, which I won’t go into but you can read about here).

Alternatively, perhaps you are interested in how the product works in heavy duty vehicles like SUVs vs standard cars. In that case, you could have one car model and one SUV model from each car brand and include vehicle class (SUV vs car) as an additional blocking variable (again you would want to randomize within each block). You could even go one step further and have several car models and several SUV models within each brand, at which point you would have three blocks (brand, class, and model) as well as nesting. As you can see, this all becomes very complicated very quickly, and I don’t expect you to be thinking at the three-block stage right now, but I want you to be aware that blocking is a very powerful tool that lets you make sense of complex experimental designs that may, at first, appear to have serious issues with confounding factors.

Finally, let’s imagine that for some reason you can’t block your experiment. In other word, there is some confounding factor that you know about, but for one reason or another you can’t block against it. For example, perhaps all of your cars are used, rather than brand new. That would obviously greatly increase the variation in your data and would likely force you to greatly increase your sample size, but even with a larger sample size and randomization, you would still need some mechanism for dealing with the fact that some cars had been driven more than others prior to your experiment. The solution is actually quite simple, you record the pre-existing mileage on each car and include those data as a factor known as a covariate in your analyses. The idea is basically that covariates explain some of the variation in your data, so by including them in the model, you get that explanation and can compensate for the variation caused by the confounding factor. As a general rule, you should do this anytime that you have some measurable variation that can’t be eliminated or blocked (see Note 4). You should measure it, then include those measurements in the analysis.

At this point, you may be wondering what on earth the point of all of this is. After all, most of my readers aren’t scientists who are going to be designing experiments, and if you are, you should be consulting a statistician or good stats book, not reading my blog. Nevertheless, if you have read this far, then I am going to assume that you are interested in science and understanding how the world works, and I’m going to assume that this interest will invariably lead you to read some scientific literature. That is where this comes in, because not all scientists know what they are doing, and we are all prone to mistakes, so when you read a scientific paper, you should look for things like this. See if they eliminated as many confounding factors as possible, see if they blocked the experiment and included those blocks in the analyses, see if they randomized correctly, and make sure that they included measurable variation in the analyses. If they didn’t do these things, then you should be dubious of their results. If you see confounding factors that they didn’t account for, or they didn’t randomize, etc. you should think twice before accepting their conclusions (for example, see my analysis of a rat/Roundup that was not done correctly). I should also clarify here that although I have been talking specifically about randomized controlled studies in this post, what I have said applies to other designs such as cohort studies as well. The techniques that I have laid out are extremely important for dealing with confounding factors, and you should make sure that they are being used correctly when you read a study.

Learning Standards

Objectives: 2016 Massachusetts Science and Technology/Engineering StandardsStudents will be able to:
* plan and conduct an investigation, including deciding on the types, amount, and accuracy of data needed to produce reliable measurements, and consider limitations on the precision of the data
* apply scientific reasoning, theory, and/or models to link evidence to the claims and assess the extent to which the reasoning and data support the explanation or conclusion;
* respectfully provide and/or receive critiques on scientific arguments by probing reasoning and evidence and challenging ideas and conclusions, and determining what additional information is required to solve contradictions
* evaluate the validity and reliability of and/or synthesize multiple claims, methods, and/or designs that appear in scientific and technical texts or media, verifying the data when possible.

A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas (2012)
Implementation: Curriculum, Instruction, Teacher Development, and Assessment
“Through discussion and reflection, students can come to realize that scientific inquiry embodies a set of values. These values include respect for the importance of logical thinking, precision, open-mindedness, objectivity, skepticism, and a requirement for transparent research procedures and honest reporting of findings.”

Next Generation Science Standards: Science & Engineering Practices
● Ask questions that arise from careful observation of phenomena, or unexpected results, to clarify and/or seek additional information.
● Ask questions that arise from examining models or a theory, to clarify and/or seek additional information and relationships.
● Ask questions to determine relationships, including quantitative relationships, between independent and dependent variables.
● Ask questions to clarify and refine a model, an explanation, or an engineering problem.
● Evaluate a question to determine if it is testable and relevant.
● Ask questions that can be investigated within the scope of the school laboratory, research facilities, or field (e.g., outdoor environment) with available resources and, when appropriate, frame a hypothesis based on a model or theory.
● Ask and/or evaluate questions that challenge the premise(s) of an argument, the interpretation of a data set, or the suitability of the design