What does "Interaction Effects" mean?

Definition of Interaction Effects in the context of A/B testing (online controlled experiments).

What is Interaction Effects?

An interaction effect is a particular kind of threat to the external validitygeneralizability of the outcome of an A/B test which is only present if one runs more than one A/B test at a time. The extent of the time overlap matters, but is not of crucial importance, even a tiny overlap may be sufficient in extreme cases.

In general, an interaction effect occurs when a test variant from one test (T1) influences the outcome of another test (T2). Most interaction effects are small and of negligible effect on the outcome of either of the involved tests. However, if there is a stronger and opposite in sign interaction in two of the four possible combination (in a scenario with two concurrent simple A/B tests), we are virtually guaranteed to get the result of one of the two concurrent A/B tests wrong. The probability of such interaction increases with the number of concurrent tests and the number of tested treatments in them, but is unknown and particular to set of experiments running at any point in time.

Interaction effects are not resolved like other confounding factors - through randomization, simply because they are in effect population changes: once the tests are over one of the interacting treatments may in fact lose to its control and therefore its effect will no longer be present. If it was running next to a different test it interacted with strongly, its result will be statistically valid, but will generalize poorly. Given the effect on the predictive value of A/B tests it is no wonder many practitioners examine this issue and take measures to alleviate it.

A good measure is to make use of the efficiency of running concurrent tests and upon completing each test to examine its performance segmented by the test groups of other tests. Interaction effects should be easily detectable if they are large enough to influence the results.

Bad measures include resorting to serial testing: the loss of efficiency will only be justified if there is a really strong expectation about an interaction effect. Even then it might not be worth it if a positive interaction which will remain untested by using serial testing remains non-examined. Another common poor measure against interaction effects is to run A/B tests in isolated "lanes" or "silos". This leads with very high probability to releasing to production of entirely untested interactions: an interaction effect could be present and a practitioner using this approach will have zero data points that allow its detection.