What does "Sample Size" mean?

Definition of Sample Size in the context of A/B testing (online controlled experiments).

What is Sample Size?

The sample size of an online controlled experiment of any kind is the number of users/sessions/pageviews/emails, etc. that are planned to or have already participated in it. It can be given as sample size per group or as total sample size: the sum of sample sizes for all test groups.

The sample size of an A/B test is influenced by 4 factors: the significance threshold (or confidence level), the required statistical power, the magnitude of the minimum effect of interest and the variance of the data at hand. The higher the required evidential threshold or estimation accuracy, the larger the sample size. Higher statistical power or variance also result in longer tests / larger sample sizes. The MEI is inversely related: increasing its magnitude results in smaller sample size for a test.

Sample size considerations enter during the planning phase of an A/B test since it is a part of the statistical model (a.k.a. "data"). Many simple statistical significance calculations and confidence interval calculations come with the assumption that the sample size has been fixed in advance and lose meaning (become invalid, inaccurate, non-interpretable) if one does not perform only a single calculation at a pre-specified point in time. Failure to do so is called "peeking" and has severe consequences.

In contrast, in sequential testing one can perform multiple evaluations at different points in time, but these still need to be specified in advance, to an extent. For example, in an AGILE A/B Test a maximum sample size is calculated and an approximate number of interim analyses is specified and despite its flexibility significant departures from these specifications can lead to the test ending without a definite conclusion.