A/B Testing: Simultaneous Tests

Everything we have discussed this week assumes you are testing one aspect of your product or service using an A/B or Multi-armed Bandit Test. In the real world, I’m sure you have dozens of different features you would like to test every day! Running simultaneous tests is possible, but dangerous.

The worst case scenario, which happens all the time, is that the same customer is exposed to multiple tests, but each interaction tests different features. Having the same customer experience multiple, different tests means that it will be hard for you to discern which features affected that customers behavior! Was it Option A of Test 1 that increased purchases, or was it Option B of Test 2? The more tests you are running the harder it might be to tell the difference.

There are some cases where running multiple A/B Tests at the same time should not cause you any problems:

If the groups of customers exposed to each test are mutually exclusive, so that no customers participating in Test 1 are also participating in Test 2.

If the overlap between customers in Test 1 and Test 2 is very small (say 1% of all customers) so any error introduced should be minor.

If the tests are of features so distinct and different that they cannot influence the same customer behavior(s).

If you need to run multiple tests but cannot meet one of those criteria, you will need to use your judgement. Experts differ on whether it’s a good idea to run overlapping, competitive tests. In my experience the arguments for both sides are:

Yes, Run Simultaneous Tests. The danger in having error and bias in your test results is better than having no data to make the decision at all.

No, Don’t Run Simultaneous Tests. There is no point in running a test if you cannot clearly rely on the results.

Personally, I think it comes down to whether the test is an input to your decision or if it makes the decision. In the former case it is fine to have some uncertainty because you will make the final call. If the test itself is making the decision (such as in a Multi-Armed Bandit Test) you need to isolate your testing because the computer algorithm is going to automatically make the choice without considering the potential bias introduced.

Testing is a skill that improves with use, so the more you use it the better you will get. Time to start testing!

Quote of the Day: “Testing leads to failure, and failure leads to understanding” – Burt Rutan