Out of the cesspool and into the sewer: A/B testing trap

Your A/B tests are trapped in a cesspool when they should be in the sewer.﻿

Do you really care why A/B testing is analogous to unwanted liquids? Not yet, so I’d better get right to the point.

On the rare occasion that it rains in Austin we get these deep puddles in the backyard. Of course it would be better if the water would flow out into the street and into the sewer, but that’s not how gravity works.

Water “seeks” the lowest point in the yard, but it’s narrow-minded. It doesn’t survey the environment, locate the lowest area, and head there. Rather, at each point along its path it chooses whatever direction is lowest in the immediate vicinity. Water doesn’t “know” that if only it made the effort to hop over the fence, it could get much lower, like in the sewer.

In mathematical terms, water doesn’t “globally optimize” for getting to the lowest possible point, but rather “locally optimizes” at each step. If you enjoy clichés, water misses the forest for the trees.

Maybe your A/B tests are missing the forest for the trees too.

A typical A/B test looks like this: You start with a baseline, then you make a change. Maybe the title changes from “Sour Cream Getting you Down?” to “Don’t know when Sour Cream Goes Bad?” You test that for a while and one wins, and then you try another variation: “Is this Sour Cream Good or Bad?”

And so on, inching your way through incremental improvements. A little here, a little there, and — you believe — soon it adds up to real money.

Except, often it doesn’t.

Often what happens is you get to a point where small changes aren’t doing anything. It can be hard to recognize this effect which is why you need to (horrors!) use math to decide empirically whether anything’s actually happening.

At this point you might be tempted to give up, but that’s wrong too.

What’s happened is that you’ve found what mathematicians call a “local minimum” and what I just called a “cesspool” (and what more tasteful writers call a “watershed.”) Your test is the water in the backyard — you’ve flowed into the lowest point, but you’re still in the backyard!

In fact, because looking in completely new places has the potential to yield far more results than incremental improvement, you need to be looking for discontinuous results from the start.

The best idea is to do both: Instead of just running A versus incremental-change A2, also run a B version that’s radically different from A. Thus you reap the straightforward benefits of incremental improvements while also searching for something that could radically improve your revenue.

Better still, if a radically different message gets you massively better results, perhaps all your messaging should change accordingly. Maybe your idea of what the market wants should shift. Maybe your entire business can change for the better.

Why poop along with minor variations when you could be toying with new ideas and new identities?

Play!

What strategies do you use for tests?Leave a comment and join the conversation.