6 A/B Testing Myths: How This Misinformation Is Messing with Your Results

However, if you’re doing A/B testing wrong, you still may be wasting a ton of time and resources.

Even with the increasing ubiquity of A/B testing, there are still many myths around the subject, some of which are quite common. To really derive value from any given technique, it’s important to understand it for what it is — including its limitations and understanding where it’s powerful.

This article will outline the top myths I’ve seen spouted time and time again in blogs and by consultants.

1. A/B Testing and optimization are the same thing

This may seem a bit finicky, but A/B testing itself doesn’t increase conversions. Many articles say something to the effect of “do A/B testing to increase conversions,” but this is semantically inaccurate.

A/B testing, otherwise known as an “online controlled experiment,” is a summative research method that tells you, with hard data, how the changes you make to an interface are affecting key metrics.

What does that mean in non-academic terms? A/B testing is a part of optimization, but optimization encompasses a broader swath of techniques than just the experimentation aspect.

As Justin Rondeau, Director of Optimization at Digital Marketer, put it, “Conversion rate optimization is a process that uses data analysis and research to improve the customer experience and squeeze the most conversions out of your website.”

Everyone can look at a site and see dozens of random things they could change if they wanted to (whether informed by data or not). But where’s the efficiency in that?

At best, you’re wasting traffic on things that don’t matter, and you’ll consistently get inconclusive results if you do this (good luck getting continued support from stakeholders if that’s the case).

Whatever the case, though, you’re faced with a tremendous opportunity cost: because you’re wasting time and resources on things that don’t matter, you’re excluded from implementing changes that fundamentally alter and improve the user experience. The things that make a real difference (and make real money).

3. Everybody should A/B test

A/B testing is incredibly powerful and useful. No one is going to (intelligently) argue against that.

But that doesn’t mean everyone should do it.

Roughly speaking, if you have less than 1,000 transactions (purchases, signups, leads, etc.) per month — you’re going to be better off putting your effort in other stuff. Maybe you could get away with running tests around 500 transactions for months — but you’re going to need some big lifts to see an effect.

A lot of micro-businesses, startups, and small businesses just don’t have that transaction volume (yet).

You have to keep in mind costs, as well. All of them, not just the cost of optimization software like Optimizely. Things like:

Conversion research. You have to figure out what to test (as mentioned above).

Designing the treatment (wireframing, prototyping, etc.).

Coding up the test.

QAing the test.

Now, let’s say you get an 8% lift, and it’s a valid winner. You had 125 leads per week, and now you have 135 / week. Is the ROI there? Maybe — it depends on your lead value. But you have to account for time, resources, and most importantly, the opportunity costs of your actions.

So, when you calculate your needed sample sizes before you run the test, do the math on the ROI as well. What would be the value of X% lift in actual dollars?

Time is a precious resource. It might be better spent elsewhere than A/B testing when you’re still small — because of math.

4. Only change one element per A/B test

This is probably the most commonly passed myth out there. The intentions are good, but it’s a flawed premise.

Here’s the advice: Only make one change per test, so you know what is actually making a difference.

For example, if you change your headline, add some social proof, and change your call-to-action text and color, and you get a 25% lift, how can you tell what caused the change?

It’s true; you really can’t. But let me also ask (and this is especially pointed at those without the luxury of high traffic sites), do you really care?

In an ideal world, notably, one made up of iterative changes that build on each other, yes, testing one thing at a time limits the noise on a test and lets you understand what exactly caused the change.

Also, you have to define your Smallest Meaningful Unit (SMU), and this is where things get a bit captious. Matt Gershoff, CEO of Conductrics put it well, telling me:

“To take the logic to an extreme, you could argue that changing a headline is making multiple changes since you are changing more than one word at a time.

So it depends on what you want to do. Do you care about the wording of your CTA and really want to know whether it caused a change or not? Are you radically changing your page? Your site?

The SMU depends on your goals, and trust me, in the real world, no analyst or optimization specialist is shouting, “only one change per test!”

As Mr. Rondeau pointed out in this post, what one thing would you change on this site (pictured below – this is an old version of the site by the way)?

Let’s even assume this site has a ton of traffic, and you can run like eight valid tests per month. If you’re doing one element at a time, where do you start? It would take you forever to test the background image, the font color, the font size, the logo at the top, the navigation thumbnails, location, size, order, copy, the body copy, the moving salesmen, etc., etc.

My point here is this: Don’t be afraid to bundle multiple changes in the same test.

5. A/B Tests are better (or worse) than bandits/MVT/etc

You see articles pop up from time to time advocating that you should “avoid multivariate (MVT)” because they’re complicated and don’t produce wins, or that bandits are inefficient compared to A/B tests — or that they’re more efficient — or whatever.

A good rule of thumb in life is if you’re dealing with a dichotomy, a this vs. that situation, you’re probably being set up. It’s probably a false dichotomy.

Truth is, A/B testing is better in some situations, where MVT is the best choice in others. Same with bandits and adaptive algorithms.

It’s a shame this myth is widespread, and statistical knowledge in the marketing world is surprisingly contained.

It’s a common occurrence, too, that your testing tool will tell you you’ve reach significance too early. So don’t put all your faith in that 95% significance.

First, pre-calculate your sample size and test duration. Then run the test for that long. Also, test for full weeks (start on a Monday? End on a Monday). And it’s recommended to run the test through multiple business cycles to account for non-stationary data (data that doesn’t stay the same over time). For instance, a big sale one week or a PR spike could throw your data off by quite a bit. Even different days have different conversion rates many times. Maybe you have a 3% conversion rate on Tuesdays but a 1.5% conversion rate on Saturdays, and maybe that difference will throw off your post-test analysis.

So test for full weeks to account for these ebbs and flows. At ConversionXL, we recommend running a test for 3-4 weeks.

Then consider a statistical significance of at least 95%.

Conclusion

A/B testing is incredibly powerful. It’s a powerful deterrent to gut-based decision making and shows you what data says you should do instead.

But if you’re committing any of the above myths, you’re limiting (or worse, destroying), the potential A/B testing has for your business growth.

I’m probably missing a few, and some of the above you’ll probably disagree with, so let me know in the comments.

About the author

Alex Birkett is a Growth Marketing Manager at ConversionXL. When he’s not optimizing websites, you can find him lifting heavy weights, singing karaoke, or eating too many breakfast tacos. He lives in Austin, Texas.