Simulating How Long To Run Your Test

How much time is enough for the true performance of your variations to come through the noise?

In this video, we’ll see a simulation of an A/A/B/C/D test as it moves from the initial state dominated by chance towards a state of equilibrium. In the process, we observe how the performance of variations can change over time due to chance alone and what sorts of intermediate outcomes we can expect. How does a false positive tend to behave over time? What is a true +10% winner likely to do half way into the test? Answering these questions helps me interpret real tests.

To speed things up, this simulation is based on a 20% baseline conversion rate and 1,000 visitor hits per day. The duration of 10 days is just an example. In your real tests, the conversion rate might be as low as 1%, which means it would take far longer to get to a similar equilibrium.

Exercise:

Use Evan Miller’s Sample Size Calculator to calculate the sample size needed to detect a 10% relative lift over a 20% baseline (answer is at the bottom of this post) – leave the power and significance on default.

Rewatch the simulation video and see how the test behaves as it approaches this sample size target.

Consider: How accurate is the relative performance of each variation at this point? What sort of outcomes are still possible by chance alone that would obscure the true performance of the variations? Based on this simulation would you run your test longer or less than this target?

I just built a small Ruby program that simulates this, the results are pretty mind-blowing.
It takes several thousand samples before you get an accurate result with Split A/B testing. Even more if you’re using something like an Epsilon Greedy algorithm.

Hey Gavin. Awesomeness. These simulations are nice in that they visualize false positives and false negatives. Agreed. And yes, it usually takes thousands of visitors for the patterns to stabilize. I think the key factors are: sample size, baseline conversion rate, and the magnitude of effect.

Unfortunately, I don’t know Ruby to run your script 🙁
Cheers,
J

Vote Up0Vote Down Reply

1 year 9 months ago

Learn From What We Try And Test

We are constantly testing new things. Sign up to get updates on what we learn.

Your Full NameYour Email Address

44,000+ people chose to learn about higher conversions from us. What about you?

Unsubscribe any time. Receive up to 2 emails per week with new UI patterns and test results that we run with succesful companies.

Reach Higher Conversions Faster With Patterns

Let us help you run more winning and higher impact A/B tests using conversion patterns. Use what worked for others on your site.