Earlier this year I published a blog post about a Baysian decision rule (now dead link, but VWO now uses this for A/B testing and tech docs are here) for choosing between two variations, each with a potentially different conversion rate. The basic idea of the decision rule is as follows.

Choose a "threshold of caring" - if A and B differ by less than this threshold, you don't care which one you choose.

Choose a prior on the distribution of conversion rates of A and B.

Compute a posterior, and use it to estimate whether the expected losses you'd make by choosing A (or B) are below the threshold of caring. If so, stop the test.

This A/B testing procedure has two main advantages over the standard Students T-Test. The first is that unlike the Student T-Test, you can stop the test early if there is a clear winner or run it for longer if you need more samples. The second is that as a Bayesian test, your outputs are easily interpreted quantities - for example, the probability that version A is better than version B, or your expected loss from choosing the wrong one.

I won't repeat the details of the method, instead referring the reader to the original post. The crucial part of the test is determining when to stop. Suppose version A has a higher empirical mean than version B, i.e. $@ \textrm{clicks on A} / \textrm{displays of A} > \textrm{clicks on B} / \textrm{displays of B} $@. Then the test is stopped when: