Sunday, December 11, 2016

In clinical trials, we often have the multiple tests or
multiplicity issue when there are more than one hypothesis tests built in the
same study and we want to claim the trial success if one of multiple hypothesis
tests is significant. For example, in steoporosis/breast cancer trial, there
may be two endpoints:

Endpoint 1: Incidence of vertebral fractures

Endpoint 2: Incidence of breast cancer

We would like to claim the success if at least one endpoint
is significant. In a trial with a low dose group, a high dose group, and a
placebo control, if we want to claim the success if either lower dose versus
placebo or high dose group versus control is statistically significant. In both
of these situations, adjustment for multiplicity must be employed.

On the other hand, not all studies with more than one
hypothesis tests will need the adjustment for multiplicity. With Alzheimer’s
disease trial as example, FDA guidance requires two endpoints

Endpoint 1: Cognition endpoint (ADAS-Cog)

Endpoint 2: Clinical global scale (CIBIC plus)

and requires that both endpoints must be significant in
order to claim success. In this case, both hypotheses are tested at significant
level of 0.05 and there is no adjustment for multiplicity is needed.

In late phase clinical trials, if multiplicity issue exists,
adjustment for multiplicity must be built into the statistical analysis plan to
avoid the inflation of the family-wise type 1 error rate (usually 0.05 or 5%).

·Single
step methods are less powerful than stepwise methods and not often used in
practice

·Accounting
for correlations leads to more powerful procedures, but correlations are not
always known

·Simes-based
methods are more powerful than Bonferroni-based methods, but control the FWER
only under certain dependence structures

·In
practice, we select the procedure that is not only powerful from a statistical
perspective, but also appropriate from clinical perspective

For a
specific clinical trial with multiplicity issue, the choice of the procedure
for multiplicity adjustment depends on the study design, if there is an order
in clinical importance of multiple hypothesis tests, or sometimes if there is a
prior evidence that one hypothesis test may be more likely to be significant.
For example, for a dose-response study, Dunnett procedure or stepdown Dunnett
procedure may be preferred. If Multiplicity problems in clinical trials have multiple
sources of multiplicity (for example, multiple endpoints + different type of
tests (superiority and non-inferiority)), then the gatekeeping procedure may be
preferred.

In industry clinical
trials, some procedures are more commonly used than others because they are
more powerful or more likely to declare the statistical significance. It may
usually be the case that the clinical trial sponsor side (the
pharmaceutical/biotech companies) would like to choose a procedure that is more
powerful (such as Hochberg procedure) while the regulatory side (such as FDA)
would prefer a procedure that is more conservative (such as Bonferroni or Holm’s
procedure).

We are still
waiting for FDA to issue its formal guidance on multiplicity issues. In the
meantime, we see that some procedures for handling the multiplicity issue are
mentioned in therapeutic area specific guidance or presentations by FDA
statisticians. For example, in CDRH’s guidance “Clinical
Investigations of Devices Indicated for the Treatment of Urinary Incontinence”,
the following paragraph was mentioned in dealing with the multiplicity issue
when performing the statistical tests for multiple secondary endpoints.

The primary statistical challenge in supporting the
indication for use or device performance in the labeling is in making multiple
assessments of the secondary endpoint data without increasing the type 1 error
above an acceptable level (typically 5%). There are many valid multiplicity
adjustment strategies available for use to maintain the type 1 error rate at or
below the specified level, three of which are listed below:

Bonferroni procedure

Hierarchical closed test procedure

Holm’s step-down procedure

Because each of these multiplicity adjustment strategies
involves balancing different potential advantages and disadvantages, we
recommend you prospectively state the strategy that you intend to use. We
recommend your protocol prospectively state a statistical hypothesis for each
secondary endpoint related to the indication for use or device performance.

EMA has a guideline “Points
to consider on multiplicity issues in clinical trials”. The document was
issued in 2002 and might be time for revision. The document mainly focused on
when the adjustment for multiplicity is needed and when the adjustment for
multiplicity is not needed. There is no mention about the procedures that could
be used for multiplicity adjustment.

Assuming that there are two hypothesis tests and the left
column indicates the p-values for these two hypothesis tests. Claiming the
statistical significance depending on which procedure to use for multiplicity
adjustment. In this specific case, the Hochberg step-up procedure is more power
than other multiplicity adjustment procedures.