In Part Two of our series on designing and implementing a successful marketing experiment, we’re going to explain the importance of control groups and how to effectively use them in your own marketing experiments.

In our previous post, we discussed how to construct a well-defined hypothesis. In our hypothetical marketing department, we’ve decided to conduct an experiment to determine whether sending a discount email to our customers is an effective way to increase sales. After a few clarifications and refinements, the hypothesis we settled on was this:

Emailing customers a 20% discount increases the likelihood that they will make a purchase in the following week.

Imagine that we are eager to test our hypothesis and see our results. We decide to send emails to our entire customer base, and then observe open and click rates on the emails. After that, we perform a funnel analysis, and find that 50% of our customers opened the email, 10% clicked on the embedded link, and 2% made a purchase. We show our results to our skeptical boss, who looks at us and says, “And how can we tell that some of these people weren’t going to make purchases anyway, even without the discount email?” The answer is: we can’t.

The problem with a funnel analysis is that it implies a degree of causality that may not really be accurate. In order to build an experiment that produces useful, meaningful results, we have to have a clear picture of what we’re comparing our results to. This is why we need to make a control group.

Let’s return to our hypothesis for a moment. Pay particular attention to the phrase: “increases the likelihood that [customers] will make a purchase.” What this means is that we expect a customer to be more likely to make a purchase if we send him a discount email than if we don’t. However, quantum physics aside, we can’t simultaneously send and not send a discount to the same customer. That means we need to find another way to quantify the change we’re looking for. There are a few good ways to go about doing this, and quite a few bad ways. We’ll begin with some of the more common experimental design fallacies we see.

One (bad) way that we might attempt to include some sort of control is by comparing the repeat purchase rates during the two weeks following the email to the previous two weeks when we did not send an email. Unfortunately, this ignores the inherent week to week variability in sales, which for many retailers tends to be quite high. Thus we could incorrectly call our email campaign a success or failure due to some larger seasonal trends.

Another (also bad) strategy would be to try our experiment with a certain subset of users and then compare our results to another subset. For example, we might send all of our international customers our discount email, and then compare the results to our US customers. We would then compare the revenue from our international customers to our US customers for the same period. While this may appear to control for week to week variability, it does not do so completely, since even this variability is not consistent between countries.

What we need, then, is a way to control for natural variability over time and between groups. Our solution will require two steps. First, we’ll create two groups. One group, which we’ll designate our control group, receives the “status quo.” In our case, we’ll say that we’re not currently sending an email of any kind, which means our control group will receive no email. If, however, you were already sending out a weekly newsletter and wanted to test the effect of including a discount in the newsletter, then your control group would receive the regular newsletter while your experimental group would receive the newsletter with the discount.

Now that we’ve defined our control and experimental groups, we’re going to assign customers to each group at random. By randomly assigning our customers between groups, we create a powerful control against natural variability, as those effects will already be taken into account in both our control and experimental groups. This will allow us to directly compare the effects of our treatment (the discount email) on one group. This type of study is referred to as a randomized control study.

For our marketing experiment, let’s take 20,000 customers and divide them evenly between our control and experimental groups. We find that our control group has a 2% conversion rate while the experimental group has a 2.5% conversion rate. With randomized control groups we can quantify the difference between the two responses and figure out how likely we are to see similar results going forward.

With our randomized control groups, we’re able to control effectively for things like natural variability in sales over time and between different groups. However, we’re going to need additional control groups to determine exactly what component of our discount email caused the increase in conversion rates. We’ll also need statistical tests to figure out if our results are likely to hold in the future. We’ll be addressing both of these issues in future blog posts.