A/B Testing in MailChimp: 7 Years of Successful Experiments

Since its release in late 2007, MailChimp’s A/B testing tool has been a huge success with our customers. In fact, over 200,000 split tests were sent in 2013 alone. Using subscribers as lab rats is still both fun and educational, but we realize that some of you might not know how to get started with the tool. To help you out, we put together a quick refresher on getting the most out of A/B testing. We’ll also share some of our research on the best ways to test.

What is an A/B test?

A/B testing allows you to send two different versions of your campaign to part of your list, and then select the version that gets the most opens or clicks to send to the rest of your list. MailChimp can run the tests automatically for you, so all you have to do is tell us what to test and what percentage of your list you want to test it on. By changing aspects of the campaign between the two different groups, you can find out what your subscribers respond to. What can you test this way?

Subject line – Try different wordings to see what gets the most attention

From name – See if your subscribers prefer mail from a person or an organization

Delivery date/time – Figure out when your subscribers like to open/click

We recommend testing only one difference between the A and B groups. When there are several differences between test groups, it’s difficult to figure out which change impacted your engagement. If you make only one change, you’ll know exactly what caused the difference.

What kind of results can I expect?

We examined A/B tests sent in the past year to measure their impact. For subject lines and from names, we calculated a Levenshtein Distance to measure the difference between test values. For send time, we looked at the absolute difference in hours. Percentages were calculated as a percent of opens or clicks, and not as a percent of emails sent, so a 10% increase on 20 clicks would bring you up to 22.

Content Tested

Tests Run in the Past Year

Avg. Open Improvement

Avg. Click Improvement

Avg. Difference

Subject

209,824

9.0%

11.0%

32 Character Changes

Send Time

25,861

9.3%

22.6%

10 Hours

From Name

6,203

12.0%

15.3%

12 Character Changes

The majority of tests were on subject lines, and the average Levenshtein distance of 32 suggests that most tests use very different subject lines. Although subject line is the most popular test, send time and from name tests are powerful too. Testing send time and from name resulted in 22% and 15% increases in clicks on average. If those results pique your interest, you might like our Send Time Optimization tool.

Getting started

So how do you build an A/B test? Let’s walk through the steps using a recent campaign from MailChimp customer Baron Fig.

When creating a new campaign, you’ll see A/B Split Campaign as an option. For free accounts, this feature will only be available after sending a few campaigns and tracking results.

After selecting A/B split campaign, you’ll tell MailChimp what you want to test. Campaign content is not listed here because it’s controlled using the A/B merge tags. You’ll also determine your test segment size, how to choose a winner (clicks/opens/manual), and how long to wait before choosing a winner.

We recommend a test segment of 20%-50%. Smaller lists benefit from higher percentages, which test on more subscribers. Larger test percentages have a lower overall impact, however, because the winning campaign will be sent to fewer people.

How do you choose between clicks and opens? If your campaign has clickable links, all of the items we mentioned can impact your click rate. Campaign content never impacts your open rate, though. We think testing on clicks makes the most sense for people who have links or want to test campaign content. Higher open rates also typically lead to higher click rates. It’s also worth noting that changing the number of links in your campaign can impact your click rate and might confound your results. If you want to have complete control over which campaign is chosen, you can tell MailChimp to wait until you manually select a winner.

Subject

From Name

Send Time/Date

Campaign Content

Clicks

Opens

We also recommend waiting at least a couple of hours before choosing a winner. If you only wait one hour, you won’t give everybody in your test group time to open your email and click on links. The winner after one hour might not be the winner after several hours.

Providing your test values

Once you’ve set up your test and selected a list to send to, you’ll need to provide your test values. You can choose to test very slight differences, like punctuation, or very substantial differences, like completely different subject lines or campaign layouts. If you’re trying out some subject lines, you can even use the subject line researcher or review some of our latest research on what makes a good subject line. Baron Fig decided to test two very different subject lines for their campaign.

Once this is done, all you need to do is build the rest of your campaign, send it out, and wait for the results!

For Baron Fig’s campaign, group A won by a statistically significant margin. To interpret this result, you need to figure out what the difference between the campaigns boils down to. Group A received a bold announcement of new information (Meet the Confidant), which is something that they can act on now. Group B received an announcement of a future event that they can’t really act on yet (Store Opens in 7 Days). The subscribers may have been more compelled to click on subject A because they knew they would immediately get something out of it. With that big of a difference in performance, we’re certainly glad the rest of the list saw the subject line that group A did.

What happens if your open or click rates aren’t that different? It could mean your subscribers were indifferent to the change you made, but it never hurts to test again. Repeating a test is a great way to make sure the change has a consistent impact. Each time you test, you’ll get a new randomly selected test group to use as lab rats!

Go forth and test

Now that you know the basics of A/B testing, the only thing left to do is dive in. As you continue to test, you’ll start to get a good grasp of what your subscribers respond to, and what doesn’t work for them. This might change as your list grows, though, so it is always a good idea to keep testing.

Tagged

Thanks you for this post that is very welcome, especially about content splitting, it was really not obvious how to do it, I even though Mailchimp wasnt able to do so as option is limited to subject, date, and from.

Thank you for a good introduction to A/B testing! I will definitely use a larger test segment from now on, since I have relatively small lists.

One thing that always confuses me a bit while doing an A/B tests is that Group A is always green(ish) and Group B is red(ish). Always makes me think that Group A is the winning group. Took me a while until I figured out the colour didn’t have anything to do with the winner/loser.

Thanks for the reply. I’d like to point out that by doing it this way, the winner decision is made based your primary a/b selection (from, subject, daypart) rather than content being tested. The existing tools are great for optimizing open rates.

I was thinking more along the lines of a/b testing based on content, and being able to make winner decisions based on click through rate of two competing content groups.

I agree that click through rate testing means less if you’re under-optimized in open rate, but I do believe that CTR testing would be a very valuable addition for more advanced email marketers.

Adam,
If you select one of the other testing options (from, subject, date/time) and then choose the same values for both A and B, you can then use the A/B merge tags to test only changes in your content. The winner is currently chosen by measuring clicks across the entire campaign rather than a specific link, but you can manually select a winner based on a specific link set of a/b links if you would like to test that link alone. The A/B results page includes link-specific stats.

Hi Ben,
Although the A/B test and RSS are two different types of campaigns, it is possible to insert a feed item into a normal or a/b test campaign using the FEED merge tag. This will result in a one-time test, but you can then implement the results into your RSS campaigns. The following link contains info about that tag as well as a video demonstration.

Hi Jeremy,
A/B testing can be combined with an RSS feed using the FEED merge tag. This will be a one-time send, but the results that you get can be incorporated into your RSS campaign. This following link contains a rundown of the FEED tag as well as a video demonstration.

Hi Allison, I’m glad you enjoyed the post! We do not currently display statistical significance, although the thought has crossed our mind. We also don’t consider the significance of the results when sending because a result has to be picked by the time the user has specified, regardless of significance. It’s great to hear that you are excited about making the most of your own A/B testing, though. The best way to take your A/B testing to the next level is to understand the complications of how it works and then combine your knowledge and ideas with what our tool offers to carry out your own advanced tests. Good luck!

When selecting clicks to determine the winner of the A/B test what number decides the winner? There are 4 different click variables in the results – recipients who clicked, unique URL/recipient clicks, clicks/unique open, or total clicks.
I am just curious because (and I could be wrong in my thinking) if you have one individual in group A who clicks the link several times, they could possibly increase the value of total clicks even though group B had more unique clicks.

Ryan,
In A/B testing, we use the number of recipients that clicked at least once divided by the number of recipients. That way, a single person who clicks on multiple links shouldn’t skew the results. Here is a more detailed rundown of how we calculate click and open rates.

I cant’t really find anything on tracking actual sales. Is it possible with MailChimp to do an A/B test en also track sales results via our own servers through using a URL parameter, such as “newsletterXY-A” and “newsletterXY-B”?

Quick question: if I’m sending my a/b test on a Friday and I want the remainder of the list to automatically receive the e-mail on Monday, should I indicate that I want the e-mail sent 2 days after the test? It’s unclear to me if MailChimp counts the test day (in this case Friday).

Hi Sam! The time that you select for your A/B test starts counting down once your original test campaigns are sent. If you select one day, your full campaign will send out one day after you first send the test campaigns. If you start a test on Friday and want the rest to send out on Monday, you would want to select three days for your wait time when setting up the test.

Hi Miriam,
We have not included those features in the app at this point, but it is something we have given a lot of thought to. For now, we are focusing on building a feature that is clear and easy for our users to understand. We’ll be sure to let everybody know if we decided to incorporate statistical tests into our split testing feature.