Non-Inferiority Designs in A/B Testing

Category:A/B Testing

Published:Sep 12, 2017

Non-Inferiority Designs in A/B Testing

author: Georgi Z. Georgiev

Most, if not all the current statistical literature on online randomized controlled experiments, (commonly referred to as “A/B Tests”), focuses on superiority designs. That is, the error of the first kind is formulated as incorrectly rejecting a composite null hypothesis of the treatment having no effect or having a negative effect. It is then controlled via a statistical significance threshold or confidence intervals, or posterior probabilities, and credible intervals in Bayesian approaches.

However, there is no reason to limit all A/B testing practice to tests for superiority. The current paper argues that there are many cases where testing for non-inferiority is both more appropriate and more powerful in the statistical sense, resulting in better decision-making, and in some cases: significantly faster tests. Non-inferiority tests are appropriate when one cares about the treatment being at least as good as the current solution, with “as good as” being defined by a specified noninferiority margin (sometimes referred to as “equivalence margin”). Certain non-inferiority designs can result in faster testing compared to a similar superiority test.

The paper introduces two separate approaches for designing non-inferiority A/B tests: tests planned for a true difference of zero or more and tests planned for a positive true difference. It provides several examples of applying both approaches to cases from conversion rate optimization. Sample size calculations are provided for both approaches and a comparison is made between them and between non-inferiority and superiority tests.

Finally, drawbacks specific to non-inferiority tests are discussed, with guidance on how to limit or control them in practice.