Sequential analysis

In statistics, sequential analysis or sequential hypothesis testing is statistical analysis where the sample size is not fixed in advance. Instead data is evaluated as it is collected, and further sampling is stopped in accordance with a pre-defined stopping rule as soon as significant results are observed. Thus a conclusion may sometimes be reached at a much earlier stage than would be possible with more classical hypothesis testing or estimation, at consequently lower financial and/or human cost.

A similar approach was independently developed at the same time by Alan Turing, as part of the Banburismus technique used at Bletchley Park, to test hypotheses about whether different messages coded by German Enigma machines should be connected and analysed together. This work remained secret until the early 1980s.[4]

In a randomized trial with two treatment groups, group sequential testing may for example be conducted in the following manner: After n subjects in each group, i.e., a total of 2n subjects, are available, an interim analysis is conducted. That means, a statistical test is performed to compare the two groups, if the null hypothesis is rejected, the trial is terminated. Otherwise, the trial continues. Another n subjects per group are recruited. The statistical test is performed again, now including all 4n subjects. If the null is rejected, the trial is terminated. Otherwise, it continues with periodic evaluations until a maximum number of interim analyses have been performed. At this point, the last statistical test is conducted, and the trial is discontinued.[5]

Sequential analysis also has a connection to the problem of gambler's ruin that has been studied by, among others, Huyghens in 1657.[6]

Step detection is the process of finding abrupt changes in the mean level of a time series or signal. It is usually considered as a special kind of statistical method known as change point detection. Often, the step is small and the time series is corrupted by some kind of noise, and this makes the problem challenging because the step may be hidden by the noise. Therefore, statistical and/or signal processing algorithms are often required. When the algorithms are run online as the data is coming in, especially with the aim of producing an alert, this is an application of sequential analysis.

The statistics of a trial that is stopped early at only n samples are different than a similar trial that is run for a predetermined number of trials, even if they end up collecting the same number of samples. If this is not accounted for when interpreting the sequential trial, the results will be biased. Therefore it is important that proper methodology is followed in order to avoid false conclusions. See [7] for a discussion.