What does "p-value Adjustment" mean?

Definition of p-value Adjustment in the context of A/B testing (online controlled experiments).

What is a p-value Adjustment?

A p-value adjustment is the adjustment of a p-value of a single significance test which is a part of an A/B test so that it conforms to the rejection region of an overall null hypothesis that spans a set of logically related significance tests. This is done when the Family-Wise Error Rate needs to be controlled (examples below). A p-value adjustment and an adjustment of the critical region are equivalent with respect to their effect on the type I error rate so which is performed is a matter of convenience and technical difficulty.

A p-value adjustment is necessary when one performs multiple comparisons or multiple testing in a more general sense: performing multiple tests of significance where only one significant result will lead to the rejection of an overall hypothesis. A multivariate test (A/B/n test) is one such example: if one simply performs pairwise tests and is willing to abandon the control group based on a single statistically significant result then an adjustment is necessary (in this case the Dunnett's Correction). Another situation in which adjustments are necessary is when there are multiple primary KPIs and if any of them is found to be statistically significant the null hypothesis would be rejected. In such cases the Sidak correction is the most powerful one which guarantees the type I error of the overall null.

P-value adjustments are also necessary in sequential testing during which significance calculations are performed a number of times during the A/B test until a decision boundary (e.g. efficacy boundary is reached. Usually such adjustments are only possible after a decision boundary has been crossed and the A/B test has been stopped.

Failure to adjust the p-value leads to a discrepancy between the nominal p-value and the actual p-value since the statistical model that describes the null hypothesis is no longer valid. This means that to the statistical estimates are compromised to extent to which the adjustment was needed, which can sometimes be established (e.g. in an MVT) but in other cases cannot (e.g. if it is due to unaccounted peeking with intent).