If you’re Picasso, don’t A/B test, but for the rest of us it’s humbling to evaluate our ideas…

Compare intuition with data to establish causality

Let’s say you launched an improvement to a feature or a new campaign one week and metrics increased the next. Does this mean it was a success change? Without a controlled experiment it’s impossible to know if the metrics moved because of your change or something else that happened in the world around your product at the same time. We can only learn when we simulate the world without the change (A), and test against a treatment that simulates the world with the change (B).

This is the obvious case for experimentation, establishing causality about the changes we make to our products or campaigns so we can continually improve them, to match the relevant travel needs from the right target audience with the product that we offer in Skyscanner. Comparing our intuition to data helps us determine how our users behave with our products and make decisions on evidence.

“Getting causality right, is like striking gold” says Arjan Haring.

Let users kill ideas, not opinion

If you ask a designer or a marketer how we can improve a product or campaign they’ll share a wide range of ideas. What often happens after the ideation process is a stakeholder review, inevitably an option is chosen and others are rejected, usually based on opinion. Without testing options, we don’t learn about this choice.

Opinion shouldn’t be the end of concepts, we need to test with real users. With A/B testing, treatments that usually end up on the cutting room floor have a chance. Instead of getting killed off in whiteboards or presentations, more ideas can get to real users for empirical evaluation. A/B testing offers us the opportunity to challenge assumptions, battles of opinions not required.

Guard against bias, especially when using data

When making data driven decisions, we have to recognize and guard against the biases that can lead us to interpret data poorly and make worse choices than if we didn’t use data at all.

“Running a bad experiment is worse than not running an experiment at all” says Uri Gneezy.

We agree, experiments aren’t useful if bias impacts the interpretation of the study.

Confirmation bias shows we’ve a tendency to focus on data that confirms our hypothesis while over-looking data that could go in the face of it. It’s surprisingly easy to find data that backs your idea if that’s your goal, which leads to another problem: HARKing (Hypothesising After The Results are Known), finding some data and changing the hypothesis to explain it.

The goal is not to find a winning metric, it’s to learn. If we want to learn, we have to predict things up front, getting trustworthy data from an A/B test, and analysing the data objectively by assuming you’re wrong and concluding there is an effect only when changes are statistically significant. The key thing is to remember that simply A/B testing or using data will not in itself improve our decision making, running trustworthy experiments is key.

Experiments keep us humble

If you’re Picasso don’t A/B test, but for the rest of us it’s humbling to evaluate our ideas…

When we expose ideas to real users in A/B tests we will often find that results are surprising. A lot of experiments, no matter how much you want them to, may not give the win we expect.

We perhaps shouldn’t be surprised that many times the things we love, our users are indifferent about, since Dan Ariely and others demonstrated the Ikea Effect; when labors of love are involved, we overvalue the thing we made ourselves, and expect others to share in our opinion.

In A/B testing, you have to be ready to learn about the impact of the change and iterate. If you love your idea more than learning what changes work for our users, you might find A/B testing agonising. We need to “Design Like We’re Right, Test Like We’re Wrong”, shipping without experiments misses the opportunity to learn something new.

Colin McFarland heads up Experimentation for Skyscanner in the UK; leading the development of our internal products for A/B testing at large scale, as well as fostering the culture of experimentation across Skyscanner.