What does "Multiple Testing" mean?

Definition of Multiple Testing in the context of A/B testing (online controlled experiments).

What is Multiple Testing?

There is no strict definition of multiple testing in the statistics literature: it sometimes refers to comparing multiple groups between each other or versus a shared control group while in other cases it refers to comparing only two groups but based on multiple characteristics of theirs. It can also refer to repeated significance tests such as those done in sequential testing (or peeking). In A/B testing multiple testing most often refers to comparisons based on multiple key performance indicators and repeated testing over time.

Regardless of the precise definition multiple statistical comparisons only one of which is enough to lead to the rejection of the null hypothesis lead to the need to control the Family-Wise Error Rate (FWER) where "family" refers to a set of logically connected significance tests and "error rate" refers to the type I error rate.

When one performs multiple testing in the above sense the Sidak Correction is the most powerful (as in statistical power) procedure one can use for p-value adjustment. Using a FWER-correcting procedure also has consequences during the planning stage, when the statistical design is decided on (with or without a risk-reward analysis). sample size calculations need to take into account the multiple comparisons correction which will be applied after the data is gathered, otherwise one is likely to end up with an underpowered test.