While the stratified
randomization has its benefits, it does not mean the more stratification
factors are better. The more stratification factors we have, the more easily
the randomization error of using a wrong stratum can occur.

It becomes common to
utilize the interactive response technology (IRT) system such as interactive response
system (IVR) or interactive web response (IWR) systems for implementing the randomization and treatment
assignments. The IRT system usually has to go through extensive quality control (QC) and user
acceptance test (UAT) before the implementation, therefore the randomization errors
can be minimized. Comparing to the manual randomization process, the
randomization error rate is lower in studies with IRT system for implementing the randomization.

However, the use of IRT
system requires the investigation site staff (pharmacist, investigator, or study
coordinator) to enter the stratification information at the time of randomization. The site staff can enter the incorrect stratification information into the IRT system, the treatment assignment will then be pulled from the
wrong stratum. The randomization error due to choosing a wrong stratum is
probably the most common randomization error we see in clinical
trials with stratified randomization. The more stratification factors we
have, the more likely incorrect stratum can be chosen.

In addition to the
number of stratification factors, ambiguous description / definition of the
randomization stratum and lack of clarity about source of measurement (for
example, the local lab or central lab results for a lab related stratification
factor) can all contribute to choosing an incorrect stratum for
randomization.

For
example, in a clinical trial in neurology area, the sponsor plan to have
patients stratified by their use of cholinesterase inhibitors, corticosteroid, immunosuppressant/immunomodulator.
The following stratification factor is constructed.

Regimen
includes only cholinesterase inhibitors

Regimen
includes corticosteroid (CS) as the only

immunosuppressant/immunomodulator, alone or in combination with
other MG medications (e.g., a subject on prednisone plus a cholinesterase inhibitor would
be in this stratum)

Without appropriate training, it is likely that the site staff will choose a wrong category for the randomization.

It is also common that the stratification factor is based on one of the laboratory measures. The original laboratory measure is a continuous result and it is then categorized for the stratification purpose. In this case, the protocol must be clear whether or not the stratification will be based on the lab results from the local lab or central lab because the results from local versus central labs can be different.

When a wrong stratification
stratum is chosen for the randomization (the randomization error occurs), the
natural reaction is trying to fix it. However, with the IRT system, it is not
easy to go back to the system to fix the randomization error. Actually it is
strongly encouraged not to try to fix the issue.

"...the safest option is to accept the
randomisation errors that do occur and leave the initial randomisation records
unchanged. This approach is consistent with the ITT principle, since it enables
participants to be analysed as randomised, and avoids further problems that can
arise when attempts are made to correct randomisation errors. A potential
disadvantage of accepting randomisation errors is that imbalance could be
introduced between the randomised groups in the number of participants or their
baseline characteristics. However, any imbalance due to randomisation errors is
expected to be minimal unless errors are common. Imbalance can be monitored by
an independent data monitoring committee during the trial and investigated by
the trial statistician at the analysis stage."

It is true that if
randomization errors can skew the analyses especially when the occurrence of
the randomization errors is not infrequent. In a paper by Ke et al "On Errors in Stratified Randomization", the impact
of the randomization errors on treatment balance and properties of analysis
approaches was evaluated.

If there are a lot of
randomization errors, the study quality and integrity will be questioned. From
the statistical analysis standpoint, the strict intention-to-treat analysis may
not be appropriate. With significant number of randomization errors with incorrect treatment
assignment, we may need to analyze the data using 'as treated' instead of 'asrandomized'. With significant number of randomization errors due to incorrect
selection of the randomization stratum, we may need to base the stratum
information from the clinical database (assuming it is correctly recorded)
instead of from the information used in IRT system.

When randomization
errors are identified during a study, the root cause of the error should be
investigated. Additional training may be needed to prevent the further
occurrence of the randomization error.

Thursday, February 22, 2018

This month, we saw FDA issued five guidance documents for
drug development in five different neurological conditions/diseases
(Alzheimer’s disease, DMD, ALS, Migraine, and Pediatric epilepsy). These newly
issued guidance documents are intended to ease the drug approval requirements
or offer the charities for the drug development pathway.

We think that this is a general trend in FDA and we expect
that the similar guidance documents will be issued for other conditions/diseases
aiming to ease the requirements for drug development – eventually speed up the
drug development process, and the innovative drugs available to patients.

“Today I’m pleased to issue five guidance documents that
benefited from the streamlined approach of this pilot as part of a broader,
programmatic focus on advancing treatments for neurological disorders that
aren’t adequately addressed by available therapies. These guidance documents
provide details on how researchers can best approach drug development for
certain neurological conditions – Duchenne muscular dystrophy (DMD) and closely
related conditions, migraine, epilepsy, AD and ALS. These guidance documents
provide our current thinking and sound regulatory and scientific advice for
product developers so that safe and effective treatments can ultimately be made
available to patients. These documents are each a culmination of thoughtful
scientific collaboration within the agency and incorporate important input from
patients, researchers and advocates. We hope that providing up-to-date, clear
information about our scientific expectations, such as clinical trial design
and ways to measure effectiveness, will save companies time and resources and
ultimately, bring effective new medicines to patients more efficiently.”

Below is a table to summarize the key points from these five
guidance documents:

Emphasizing the difficulties in designing trials of drugs for these
conditions.

Efficacy endpoints, which basically leave it up to individual study
sponsors to discuss with FDA staff the best approach on a case-by-case basis

the DMD guideline did not do, is open a path for approval based
solely on biomarker effects such dystrophin levels in muscle, although, effects
on objective measures such as respiratory and cardiac muscle function can be
used to support approval.

For drugs intended for children age 4 and older with partial onset
seizures, the FDA will no longer require that efficacy trials be conducted in
children. The agency will now consider efficacy data from adult patients to
be sufficient for pediatric approval.

Monday, February 12, 2018

In a previous post, the terms of ‘multiple endpoints’ and
‘co-primary endpoints’ were discussed. If a study contains two co-primary
efficacy endpoints, study is claimed to be successful if both endpoints have
statistical significance at alpha=0.05 (no adjustment for multiplicity is
necessary). If a study contains multiple (two) primary efficacy endpoints, the
study is claimed to be successful if either endpoint is statistically
significant. However, in later situation, the adjustment for multiplicity is
necessary to maintain the overall alpha at 0.05. In other words, for hypothesis
test for each individual endpoint, the significant level alpha is less than
0.05.

The most simple and straightforward approach is to apply the Bonferroni
correction. The Bonferroni correction compensates for the increase in number of
hypothesis tests. each individual hypothesis is tested at a significance level
of alpha/m, where alpha is the desired overall alpha level (usually 0.05) and m is
the number of hypotheses. If there are two hypothesis tests (m=2), each
individual hypothesis will be tested at alpha=0.025.

The Bonferroni method is a single-step procedure that is
commonly used, perhaps because of its simplicity and broad applicability. It is
a conservative test and a finding that survives a Bonferroni adjustment is a
credible trial outcome. The drug is considered to have shown effects for each
endpoint that succeeds on this test. The Holm and Hochberg methods are more
powerful than the Bonferroni method for primary endpoints and are therefore
preferable in many cases. However, for reasons detailed in sections IV.C.2-3,
sponsors may still wish to use the Bonferroni method for primary endpoints in
order to maximize power for secondary endpoints or because the assumptions of
the Hochberg method are not justified. The most common form of the Bonferroni
method divides the available total alpha (typically 0.05) equally among the
chosen endpoints. The method then concludes that a treatment effect is
significant at the alpha level for each one of the m endpoints for which the
endpoint’s p-value is less than α /m. Thus, with two endpoints, the critical
alpha for each endpoint is 0.025, with four endpoints it is 0.0125, and so on.
Therefore, if a trial with four endpoints produces two-sided p values of 0.012,
0.026, 0.016, and 0.055 for its four primary endpoints, the Bonferroni method
would compare each of these p-values to the divided alpha of 0.0125. The method
would conclude that there was a significant treatment effect at level 0.05 for
only the first endpoint, because only the first endpoint has a p-value of less
than 0.0125 (0.012). If two of the p-values were below 0.0125, then the drug
would be considered to have demonstrated effectiveness on both of the specific
health effects evaluated by the two endpoints. The Bonferroni method tends to
be conservative for the study overall Type I error rate if the endpoints are
positively correlated, especially when there are a large number of positively
correlated endpoints. Consider a case in which all of three endpoints give
nominal p-values between 0.025 and 0.05, i.e., all ‘significant’ at the 0.05
level but none significant under the Bonferroni method. Such an outcome seems
intuitively to show effectiveness on all three endpoints, but each would fail
the Bonferroni test. When there are more than two endpoints with, for example,
correlation of 0.6 to 0.8 between them, the true family-wise Type I error rate
may decrease from 0.05 to approximately 0.04 to 0.03, respectively, with
negative impact on the Type II error rate. Because it is difficult to know the
true correlation structure among different endpoints (not simply the observed
correlations within the dataset of the particular study), it is generally not
possible to statistically adjust (relax) the Type I error rate for such
correlations. When a multiple-arm study design is used (e.g., with several
dose-level groups), there are methods that take into account the correlation
arising from comparing each treatment group to a common control group.

The guidance also discussed the weighted Bonferroni approach:

The Bonferroni test can also be performed with different weights
assigned to endpoints, with the sum of the relative weights equal to 1.0 (e.g.,
0.4, 0.1, 0.3, and 0.2, for four endpoints). These weights are prespecified in
the design of the trial, taking into consideration the clinical importance of
the endpoints, the likelihood of success, or other factors. There are two ways
to perform the weighted Bonferroni test:

The
unequally weighted Bonferroni method is often applied by dividing the overall
alpha (e.g., 0.05) into unequal portions, prospectively assigning a specific
amount of alpha to each endpoint by multiplying the overall alpha by the
assigned weight factor. The sum of the endpoint-specific alphas will always be
the overall alpha, and each endpoint’s calculated p-value is compared to the
assigned endpoint-specific alpha.

An
alternative approach is to adjust the raw calculated p-value for each endpoint
by the fractional weight assigned to it (i.e., divide each raw p-value by the
endpoint’s weight factor), and then compare the adjusted p-values to the overall
alpha of 0.05.

These two approaches are equivalent

The guidance mentioned that reason for using the weighted
Bonferroni test are:

Clinical importance of the endpoints

The likelihood of success

Other factors

Other factors could include:

With two primary efficacy endpoints, the expectation for
regulatory approval for one endpoint is greater than another

Sample size calculation indicates that the sample size that is
sufficient for primary efficacy endpoint #1 is overestimated for the primary
efficacy endpoint #2

With the weighted Bonferroni correction, the weights are subjective and are essentially arbitrarily selected which results in the partition of unequal significant levels (alphas) for different endpoints.

There are a lot of applications of Bonferroni and weighted Bonferroni in practice. Here are some examples:

The study was to be considered positive if either of the two
coprimary end points, progression free or overall survival, was significantly
longer with durvalumab than with placebo. Approximately 702 patients were
needed for 2:1 randomization to obtain 458 progression-free survival events for
the primary analysis of progressionfree survival and 491 overall survival
events for the primary analysis of overall survival. It was estimated that the
study would have a 95% or greater power to detect a hazard ratio for disease
progression or death of 0.67 and a 85% or greater power to detect a hazard
ratio for death of 0.73, on the basis of a log-rank test with a two-sided
significance level of 2.5% for each coprimary end point.

However, in the original study protocol, the weighted Bonferroni method was used and unequal alpha levels were assigned to OS and PFS.

The two co-primary endpoints of this study are OS and PFS.
The control for type-I error, a significance level of 4.5% will be used for
analysis of OS and a significance level of 0.5% will be used for analysis of
PFS. The study will be considered positive (a success) if either the PFS
analysis results and/or the OS analysis results are statistically significant.

Here, a

weight of 0.9 (resulting in an alpha 0.9 x 0.05 = 0.045) was
given to OS and a weight of 0.1 (resulting in an alpha 0.1 x 0.05 = 0.005) was
given to PFS.

In COMPASS-2 Study (Bosentan
added to sildenafil therapy in patients with pulmonary arterial hypertension),
the original protocol contained two primary efficacy endpoints and weighted Bonferroni
method (even though it was not explicitly mentioned in publication) was used for
multipolicy adjustment. A weight of 0.8 (resulting in an alpha 0.8 x 0.05 =
0.04) was given to time to first mortality/morbidity event and a weight of 0.2 (resulting
in an alpha 0.2 x 0.05 = 0.01) was given to the change from baseline to Week 16
in 6MWD.

The initial assumptions for the primary end-point were an
annual rate of 21% on placebo with a risk reduced by 36% (hazard ratio (HR)
0.64) with bosentan and a negligible annual attrition rate. In addition, it was
planned to conduct a single final analysis at 0.04 (two-sided), taking into
account the existence of a co-primary end-point (change in 6MWD at 16 weeks)
planned to be tested at 0.01 (two-sided). Over the course of the study, a
number of amendments were introduced based on the evolution of knowledge in the
field of PAHs, as well as the rate of enrolment and blinded evaluation of the
overall event rate. On implementation of an amendment in 2007, the 6MWD
end-point was change from a co-primary end-point to a secondary endpoint and
the Type I error associated with the single remaining primary end-point was
increased to 0.05 (two-sided).

Meeting
of the Antimicrobial Drugs Advisory Committee (AMDAC) “, the sponsor
(Bayer) conducted two pivotal studies: RESPIRE 1 and RESPIRE 2. Each study
contained two hypotheses. Interestingly, for multiplicity adjustment, the
Bonferroni method was used for RESPIRE 1 study and the weighted Bonferroni
method for RESPIRE 2 study. We can only guess why weights of 0.02 and 0.98 (resulting
in a partition of alpha of 0.001 and 0.049) was chosen in RESPIRE 2 study

Thursday, February 01, 2018

In previous post, the CDISC data structure for protocol deviations was discussed. The protocol deviation data set (DV domain) is an event data set (just like how we record the adverse event). The tabulation data set should contain one record per protocol deviation per subject. In other words, each protocol deviation is always tied to each individual subject. In DV data set, each record of the protocol deviations should have an unique identifier for subject ID (usubjid).

There are situations where the protocol deviations are on the site level, not the subject level. For example, many study protocols have a specific requirement for handling the study drugs (or IP - investigational products). The study drug must be stored under the required temperatures. An temperature excursion occurs when a time temperature sensitive pharmaceutical product is exposed to temperature outside the ranges prescribed for storage. The temperature excursion may result in inactivation of the study drug efficacy or cause safety concern. If there are multiple subjects enrolled in the problematic site, the protocol deviation associated with temperature excursion will have impact on all subjects at this site - this is called the site level protocol deviation.

There is no specific discussion about documenting and handling site level protocol deviations in ICH and CDISC guidelines.

According to CDISC SDTM, Protocol Deviations should be captured in DV domain. According to current SDTM standard, all tabulation data sets including DV are designed for subject data (with the only exception of Trial Design info).

For site level deviations, the deviations are not associated with any specific subjects, they can not be directly included in the DV data set. There may be two ways to handle the site level protocol deviations:

If any site level deviation has impact on all or
multiple subjects enrolled at that site, the specific deviation can be repeated for each affected subject

It is advisable to pre-specify the instructions for handling the site level protocol deviations so that the site level protocol deviations are recorded appropriately.

Identifying and recording the protocol deviations including site level protocol deviations should be an ongoing process during the conduct of the clinical trials. If we wait until the end of the study, we may have
difficulties to determine if a specific site level deviation has impact on all
subjects or partial subjects at that site.

It is important to ensure
the GCP compliance and adherence of the study protocol in clinical trials. However,
due to the complexity of the clinical trial operations, it is not possible to
have a perfect study 100% according to the study protocol. It will be a miracle
to complete a study without any protocol deviations. Therefore, identifying and
documenting the protocol deviations become a critical task.

Protocol deviations
also have impact on the statistical analysis side. Protocol deviation
will not result in excluding the subjects from full analysis set (usually the
intention-to-treat population), however, the important protocol deviations may
result in subjects being excluded from the per-protocol population. For pivotal
studies, analyses using per-protocol population will always be
performed as one type of sensitivity analyses to evaluate the robustness of the
study results.

If
the results from the main analyses (usually based on intention-to-treatment
population) is negative, the analyses on per-protocol population may not be
very meaningful. If the results from the main study is positive, the analyses on per-protocol population will be important. If there are a
lot of protocol deviations in a study, it may trigger the regulatory reviewer’s scrutiny and
damp the confidence about the study results.

The
CDISC’s CDASH (the guidelines for case report form design) and SDTM (theguidelines for standardized tabulation data structure) have the detail discussions about
the protocol deviations. While violation of inclusion/exclusion criteria (or
study entry criteria or eligibility criteria) may also be considered as part of
the protocol deviations, the CDISC discussions are only for protocol deviations
occurred after the study start (subjects randomized into the study and/or
received the first dose of the study drug). Violation of Inclusion/exclusion should be collected separately in IE form (domain).

CDASH
recommends identifying the protocol deviation (DV domain) through other sources, not from the
case report form. It stated:

5.14.1 Considerations Regarding Usage of a Protocol Deviations CRF

The general recommendation is to avoid the creation of a Protocol Deviations CRF (individual sponsors can determine whether it is needed for their particular company), as this information can usually be determined from other sources or derived from other data. As with all domains, Highly Recommended fields are included only if the domain is used. The DV domain table was developed as a guide that clinical teams could use for designing a Protocol Deviations CRF and study database should they choose to do so.

In
practice, the protocol deviations are usually collected and maintained by the clinical
team either in CTMS (clinical trial management system) or excel spreadsheet even though there are examples (maybe
the future trend) of collecting the protocol deviations through the electronic
data capture (EDC).

If
sponsor decides to use a case report form (paper or electronic) to capture the protocol
deviations, CDASH recommends the following:

If a sponsor decides to use a Protocol Deviations CRF, the sponsor should not rely on this CRF as the only source of protocol deviation information for a study. Rather, they should also utilize monitoring, data review and programming tools to assess whether there were protocol deviations in the study that may affect the usefulness of the datasets for analysis of efficacy and safety.

SDTM requires a protocol deviation (PD) domain for tabulation data set (pd.xpt) no matter how the original protocol deviation data is collected.

In ADAM, whether or not creating an analysis data set for protocol deviation is up to each individual's decision. It is not required. If a summary table for protocol deviation is needed, it can be programmed from the SDTM PV data set and ADAM ADSL data set.

Thursday, January 11, 2018

In recent news announcement, sponsor had to disclose the
errors in statistical analyses. All these errors have consequences to the company’s
value or even the company’s fate. I hope that the study team members who made
this kind of mistakes still have a job in their company. I did have a friend ending
up losing the job due to the incorrect report of the p-value.

Here are two examples. In the first example, the p-value was
incorrectly calculated and announced, the later had to be corrected – very embarrassing
for the statistician who made this mistake. In the second example, the mistake
is more on the programming and data management side. Had the initial results
been positive, the sponsor might never go back to re-assess the outcomes and
the errors might never be identified.

Axovant Sciences (NASDAQ:AXON)
today announced a correction to the data related to the Company’s
investigational drug nelotanserin previously reported in its January 8, 2018
press release. In the results of the pilot Phase 2 Visual Hallucination study,
the post-hoc subset analysis of patients with a baseline Scale for the
Assessment of Positive Symptoms - Parkinson's Disease (SAPS-PD) score of
greater than 8.0 was misreported. The previously reported data for this
population (n=19) that nelotanserin treatment at 40 mg for two weeks followed
by 80 mg for two weeks resulted in a 1.21 point improvement (p=0.011,
unadjusted) were incorrect. While nelotanserin treatment at 40 mg for two weeks
followed by 80 mg for two weeks did result in a 1.21 point improvement, the
p-value was actually 0.531, unadjusted. Based on these updated results, the
Company will continue to discuss a larger confirmatory nelotanserin study with
the U.S. Food and Drug Administration (FDA) that is focused on patients with dementia
with Lewy bodies (DLB) with motor function deficits. The Company may further
evaluate nelotanserin for psychotic symptoms in DLB and Parkinson’s disease
dementia (PDD) patients in future clinical studies.

Re-Assessment of Outcomes
Following database lock and unblinding of treatment assignment,
the Applicant performed additional data assessments due to errors identified in
the programming/data entry that impacted identification of PEs. This led to
changes in the final numbers of PEs. Based on discussion the Applicant had with
the PEBAC Chair, it was decided that 10 PEs initially adjudicated by the PEBAC
were to be re-adjudicated by the PEBAC using complete and final subject-level
information. This led to a re-adjudication by the PEBAC who were blinded to
subject ID, site ID, and treatment. Result(s) of prior adjudication were not
provided to the PEBAC.

Efficacy results presented in Section 7.3 reflect the
revised numbers. Further details regarding the reassessment by the PEBAC are
discussed in Section 7.3.6.

7.3.6 Primary Endpoint Changes after Database Lock and
Un-Blinding

Following database lock and treatment assignment
un-blinding, the Applicant performed additional data assessments leading to
changes in the final numbers of PEs. Specifically, per the Applicant, during a
review of the ORBIT-3 and ORBIT-4 data occurring after database locking and
data un-blinding (for persons involved in the data maintenance and analyses),
‘personnel identified errors in the programming done by Accenture Inc. (data
analysis contract research organization (CRO)) and one data entry error that
impacted identification of PEs. Because of the programming errors, the
Applicant states that they chose to conduct a ‘comprehensive audit of all
electronic Case Report Forms (eCRFs) entries for signs, symptoms or laboratory
abnormalities as entered in the PE worksheets for all patients in ARD-3150-1201
and ARD-3150-1202’ (ORBIT-3 and ORBIT-4). From this audit, the Applicant notes
‘that no further programming errors’ were identified but instead 10 PE events
(three from ORBIT-4 and seven from ORBIT-3) were found for which the PE
assessment by the PEBAC was considered potentially incorrect. This was based on
the premise that subject-level data provided to the PEBAC during the original
PE adjudication were updated at the time of the database lock. Reasons provided
are: 1) the clinical site provided update information to the eCRF after

the initial PEBAC review (2 PEs), 2) incorrect information
was supplied to the PEBAC during initial adjudication process (2 PEs), 3)
inconsistency between visit dates and reported signs and symptoms (6 PEs).
After discussion with the PEBAC Chair, it was decided that these 10 PEs
initially deemed PEs by the PEBAC were to be re-assessed by the PEBAC using
complete and final subject-level information. This led to a re-adjudication by
the PEBAC during a closed session on January 25, 2017. This re-adjudication was
coordinated by Synteract (Applicant’s CRO) who provided data to the PEBAC that
were blinded to subject ID, site ID, and treatment. In addition, result(s) of
prior adjudication were not provided. While the PEBAC was provided with subject
profiles for other relevant study visits, the PEBAC focus was only on the
selected visits for which data were updated or corrected.

Because of the identified programming errors and PEBAC
re-adjudication, there were two new first PEs added to the Cipro arm in ORBIT-3
and two new first PEs added to the placebo arm in ORBIT-4. Given these changes,
the log-rank p-value in ORBIT-4 changed from 0.058 to 0.032 (when including sex
and prior PEs strata). The p-value in ORBIT-3 changed from 0.826 to 0.974
remaining insignificant. These changes are summarized in Table 9. Note that
there were no overall changes in the results of the secondary endpoints
analyses from changes in PE status described above.

It is inevitable to make mistakes during the statistical analysis if there is no adequate procedures to prevent them. The following procedures can minimize the chances of making the mistakes as the examples above.

Independent validation process (double programming): The probability for two independent people to make the same mistake is very very low.

Dry-run process: using the dirty data, perform the statistician analysis using the dummy randomization schedule, i.e., perform the statistical analysis with the real data, but fake treatment assignment. The purpose is to do the programming work up front and to check the data upfront so that the issues and mistakes can be identified and corrected.

Tuesday, January 02, 2018

In clinical trials, the most
critical safety information is the adverse event (AE). There are numerous
guidance and guidelines regarding the AE collection. However, there
are still a lot of confusions. The very basic question is when to start the AE
collection and when to stop the AE collection. For example, here are some discussions:

It is a very common practice in industry-sponsored clinical
trials that AE record keeping begin after informed consent. Adverse
events will be collected even for those patients who signed informed consent, but subsequently
failed the inclusion/exclusion criteria during the screening period. If we
attend the GCP training, it is very likely we will be told this is the way we
are supposed to do for adverse event collection in order to be compliant with GCP.

However, the AE definition in the
ICH E2A guidance document suggests that adverse event can be recorded at
or after the first treatment, not the signing of the informed consent form (ICF). The
ICH E2A defined the AE as:

Adverse Event (or Adverse Experience) Any untoward medical
occurrence in a patient or clinical investigation subject administered a
pharmaceutical product and which does not necessarily have to have a causal
relationship with this treatment. An adverse event (AE) can therefore be any
unfavourable and unintended sign (including an abnormal laboratory finding, for
example), symptom, or disease temporally associated with the use of a medicinal
product, whether or not considered related to the medicinal product.

A. Commonly, the study period during which the investigator
must collect and report all AEs and SAEs to the sponsor begins after informed
consent is obtained and continues through the protocol-specified post-treatment
follow-up period. Since the ICH E2A guidance document defines an AE as “any
untoward medical occurrence in a patient or clinical investigation subject
administered a pharmaceutical product…” This definition clearly excludes the
period prior to the IMP’s administration (in this context a placebo comparator
used in a study is considered an IMP. Untoward medical occurrences in subjects
who never receive any study treatment (active or blinded) are not treatment
emergent AEs and would not be included in safety analyses. Typically, the
number of subjects “evaluable for safety” comprises the number of subjects who
received at least one dose of the study treatment. This includes subjects who
were, for whatever reason, excluded from efficacy analyses, but who received at
least one dose of study treatment.

There are situations in which the reporting of untoward
medical events that occur after informed consent but prior to the IMP’s
administration may be mandated by the protocol and/or may be necessary to meet
country-specific regulatory requirements. For example, it is considered good risk
management for sponsors to require the reporting of serious medical events
caused by protocol-imposed screening/diagnostic procedures, and medication
washout or no treatment run-in periods that precede IMP administration. For
example, a protocol-mandated washout period, during which subjects are taken
off existing treatments (such as during crossover trials) that they are
receiving before the test article is administered, may experience withdrawal
symptoms from removal of the treatment and must be monitored closely. If the
severity and/or frequency of AEs occurring during washout periods are
considered unacceptable, the protocol may have to be modified or the study
halted. Some protocols may also require the structured collection of signs and
symptoms associated with the disease under study prior to IMP administration to
establish a baseline against which post-treatment AEs can be compared. In some
countries, regulatory authorities require the expedited reporting of these
events to assess the safety of the human research.

For a specific study, the screening procedure and the
potential injury of the screening procedure should be considered when deciding
when to start the AE collection. For a study with very minimal or routine screening
procedure (such as phase I study / clinical pharmacology study in healthy
volunteers at phase I clinic), it may be ok to collect the AE starting from the
first treatment. For a study with
comprehensive screening procedures or with invasive screening procedures, it is advised that the AE collection should start once the subject signs the ICF. For
example, in a study assessing the effect of a thrombolytic agent in ischemic
stroke patients, the screening procedures include CT scan and arteriogram to assess the location and size
of the clot – which can cause
adverse effects / injuries to the study participants. In this situation, it is strongly advised
that the AE is collected at the ICF signing.

If the AE is collected from the ICF signing, during the
statistical analysis, the AEs can be divided into non-treatment emergent AE and
treatment emergent AEs (TEAE). Non-TEAEs are those AEs occurred prior to the first study treatment and TEAEs are those AEs with onset date/time at or after the first study treatment. Non-TEAE and TEAE will be summarized
separately and the extensive safety analyses will be mainly based on the TEAE.

When to Stop the AE collection?

It is even more murky in terms of when to stop the AE
collection because the end of the study is trickier than the start of the study. A study may have
a follow-up period after the completion of the study treatment. A subject may discontinue the
study treatment earlier, but remain in the study to the end.

There is no clear guidance how long after the last study
treatment the AEs need to be collected. In practice, it is common to continue
reporting AEs following the last study treatment – the period for post study
treatment may be 7 days following the last treatment or 30 days following the
last treatment. The decision of AE
collection during the follow-up period should be based on the half life of the
study drug, whether there are AEs of special interest related to the study drug
in investigation, and whether it is in pediatric or adult population.

In oncology clinical trials, it is typical not to collect
the adverse events during the long-term follow-up period. Adverse events may
just be collected for short period after the last treatment, for example 30 days or 3 months or 6 months following the last study treatment. During the long-term follow-up period, only
the study endpoint (tumor related events) such as death, tumor progression, or
secondary malignant event will be collected.

QUESTION: What are the investigator's responsibilities in
terms of reporting the post-discontinuation adverse events? On one hand, since
the patient discontinued from the study, some think that the investigator has
no right to review the patient's clinical record under HIPAA (authorization
terminated) or informed consent regulations (consent withdrawn) and consequently
has no authority or responsibility to report the adverse events. On the other
hand, there does not appear to be any variances to an investigator's IND
obligations (even when a patient discontinues from the study) with respect to
reporting adverse events according to 21 CFR 312.64. Also, would the
investigator's reporting responsibilities be the same for Situation A and
Situation B?

ANSWER:
FDA has stated that clinical investigators need to capture
information about adverse effects resulting from the use of investigational
products, whether or not they are conclusively linked to the product. The fact
that a subject has voluntarily withdrawn from the study does not preclude FDA's
need for such information. In fact, withdrawal is often due to adverse effects,
some already realized and others beginning and that will later progress. For
your first scenario, that is obviously not a real problem since the
investigator is also the individual's private physician and obviously has this
information. While you are correct to worry about privacy issues in both
scenarios, the public welfare is a larger issue. Failure to capture and report
adverse effects, particularly serious adverse effects, will not only be a
problem for the individual in question but potentially for other actual and
potential study subjects. It is also essential to capture the information so
that the total picture is available to FDA when a marketing decision is
imminent. The individual in question may be one of very few who would evidence
the particular adverse effect, particularly given the limited number of
individuals included in a study. However, this information could have major
ramifications for the potentially large population of users of the drug once
legally marketed. How to best go about collecting the details of the adverse
effect is obviously a different issue.

In summary, the AE collection can be depicted as the following where TEAE stands for treatment-emergent adverse event: