Introduction

The goal of evidence-based medicine (EBM) is to integrate research evidence, clinical judgment, and patient preferences in a way that maximizes benefits and minimizes harms to the individual patient. The foundational, gold-standard research design in EBM is the randomized, parallel group clinical trial. However, the majority of patients may be ineligible for or unable to access such trials.1 In addition, these clinical experiments generate average treatment effects, which may not apply to the individual patient; some patients may derive greater benefit than average from a particular treatment, others less. Patients want to know: which treatment is likely to work better for me? To generate individual treatment effects (ITEs), clinical investigators have taken several tacks, including subgroup analysis, matched pairs designs, and n-of-1 trials. Of these, n-of-1 trials provide the most direct route to estimating the effect of a treatment on the individual.

In this chapter, we introduce n-of-1 trials by providing definitions and a rationale, delineating indications for use, describing key design elements, and addressing major opportunities and challenges.

Defining N-of-1 Trials

N-of-1 trials in clinical medicine are multiple crossover trials, usually randomized and often blinded, conducted in a single patient. As such, n-of-1 trials are part of a family of Single Case Designs that have been widely used in psychology, education, and social work. In the schema of Perdices et al., the Single Case Designs family includes case descriptions, nonrandomized designs, and randomized designs.2 N-of-1 trials are a specific form of randomized or balanced designs characterized by periodic switching from active treatment to placebo or between active treatments ("withdrawal-reversal" designs). N-of-1 trials were introduced to clinicians by Hogben and Sim as early as 1953,3 but it took 30 years for the movement to find an effective evangelist in the person of Gordon Guyatt at McMaster University.4-6 Many of the pioneers of the movement established active n-of-1 trial units in academic centers, only to abandon them once funding was exhausted.7 However, several units are still thriving, and over the past three decades more than 2,000 patients have participated in published n-of-1 trials; fewer than 10 percent of them chose treatments inconsistent with the results.

In contrast to parallel group trials, n-of-1 trials use crossover between treatments to address the problem of patient-by-treatment interaction. This situation arises when characteristics of the individual affect whether treatment A or treatment B (which could be an active treatment, a placebo, or no treatment) delivers superior results. Also, by prescribing multiple episodes of treatment, n-of-1 trials increase precision of measurement and control for treatment-by-time interaction, that is, the possibility that the relative effects of two treatments vary over time.

Rationale for N-of-1 Trials in the Era of Patient-Centered Care

The success of an n-of-1 trial largely depends on the collaboration and commitment of both clinician and patient. Clinicians must explain the process to their patients, collaborate with them in developing outcome measures most appropriate to the individual, monitor patients at regular intervals throughout the trial, evaluate and explain what the results of the trial mean, and work with patients to determine the course of treatment based on trial findings. Patients participating in n-of-1 trials must be involved in selecting therapies for evaluation, recording processes and outcomes (including nonadherence to treatment protocols), and sharing in treatment decisionmaking. As the centerpiece of patient-centered care, patient engagement has been shown to improve health outcomes among patients with chronic illness.8 Nikles et al. reported that patients who had completed an n-of-1 trial had a greater understanding and awareness of their condition and felt a greater sense of control when it came to decisions about their health.9

It is the intent of this User’s Guide to encourage and facilitate broader use of n-of-1 trials as a patient-centered clinical decision support tool. With appropriate infrastructure support, n-of-1 trials can be used by individual practicing clinicians in their daily care of individual patients. While the naturalistic application of n-of-1 trials may involve a single patient-clinician pair, over time there may emerge a multitude of such pairs. As discussed in the section “Statistical Analysis and Feedback for Decisionmaking,” n-of-1 trials can be combined to provide more informative treatment decisions for individual patients by using information from other similar n-of-1 trials. This mimics the way that clinicians learn from their prior clinical experience and from their colleagues’ clinical experience.

At the same time, research studies may use n-of-1 trials to examine decision support, quality improvement, and implementation of improved clinical and organizational procedures. As a research study design, n-of-1 trials are uniquely capable of informing clinical decisions for individual patients. Therefore, the research goal (to produce generalizable knowledge that can be applied to future patients) is compatible with the clinical goal of serving the needs of the individual patients participating in these trials. (The same is often not true for other research study designs, such as the usual parallel group randomized controlled trials [RCTs], in which patients contribute to the research but usually do not benefit directly in terms of their own clinical decisionmaking.) This special feature of n-of-1 trials may facilitate the recruitment and retention of patients and clinicians in research studies.

Beyond their potential for promoting patient-centered care, n-of-1 trials may have additional pragmatic value. With escalating drug costs, health care systems are struggling to provide cost-effective therapies. N-of-1 trials offer an objective way of determining individual response to therapy: if two therapeutic options are shown to have equivalent effectiveness in a given individual, the less costly option could be chosen. This approach to comparative effectiveness could apply to different classes of medications, as well as formal assessment of the bioequivalence of generic and proprietary pharmaceuticals. Considering that n-of-1 trials are particularly suited to chronic conditions, the savings to the health care system could be substantial.

Indications, Contraindications, and Limitations

N-of-1 trials are indicated whenever there is substantial uncertainty regarding the comparative effectiveness of treatments being considered for an individual patient. Uncertainty can result from a general lack of evidence (as when no relevant parallel group RCTs have been conducted), when the existing evidence is in conflict, or when the evidence is of questionable relevance to the patient at hand.10 Uncertainty may also result from heterogeneity of treatment effects (HTE) across patients that cannot be easily predicted from available prognostic factors. HTE is the variance of ITEs across patients, where the ITE is the difference in effects (net benefits) between treatment A and treatment B for an individual patient.11 Though the extent of HTE for common conditions and treatments is not well characterized, some analyses suggest it is substantial.12-16

N-of-1 trials are applicable to chronic, stable, or slowly progressive conditions that are either symptomatic or for which a valid biomarker has been identified. Acute conditions offer no opportunity for multiple crossovers. Rapidly progressive conditions (or those prone to sudden, catastrophic outcomes such as stroke or death) are not amenable to the deliberate experimentation of n-of-1 trials. Asymptomatic conditions make outcomes assessment difficult, unless a valid biomarker exists.17 Examples of such biomarkers might include blood pressure or LDL cholesterol in heart disease, sedimentation rate in some chronic autoimmune diseases, or intraocular pressure in glaucoma. Some patient groups (e.g., patients with rare diseases) may be particularly motivated to participate in n-of-1 trials owing to the paucity of other evidence needed to substantiate treatment effect.

For practical reasons, treatments to be assessed in n-of-1 trials should have relatively rapid onset and washout (i.e., few lasting carryover effects). Treatments with a very slow onset of action (e.g., methotrexate in rheumatoid arthritis) could outlast the patience of the average patient and clinician.18 On the other hand, treatments with prolonged carryover effects would require a substantial washout period to distinguish between the effects of the current treatment and the previous treatment.5 In addition, regimens requiring complex dose titration (e.g., loop diuretics in patients with comorbid congestive heart failure and chronic kidney disease) are not well suited for n-of-1 trials.

Major Design Elements of N-of-1 Trials

The major design elements of n-of-1 trials are balanced sequence assignment, blinding, and systematic outcomes measurement. Before introducing these elements, we offer a description of standard clinical practice.

Standard Clinical Practice

In ordinary practice, the clinician prescribes treatment and asks that the patient return for followup. At the followup encounter, the clinician asks the patient if he or she is improving. If the patient responds positively, the treatment is continued. If not, the clinician and patient discuss alternative strategies such as a dose increase, switching to a different treatment, or augmenting with a second treatment. This process continues until both agree that a satisfactory outcome has been achieved, until intolerable side effects occur, or until no further progress seems possible. Although treatments are administered in sequence, there is no systematic repetition of prior treatments (replication), and the treatment assignment sequence is based on physician and patient discretion (not randomized or balanced). Neither clinician nor patient is blinded. Typically, there is no systematic assessment of outcomes. As a result, it is easy for both patient and clinician to be misled about the true effects of a particular therapy.

Take for example Mr. J, who presents to Dr. Alveolus with a nagging dry cough of 2 months duration that is worse at night. After ruling out drug effects and infection, Dr. Alveolus posits perennial (vasomotor) rhinitis with postnasal drip as the cause of Mr. J’s cough and prescribes diphenhydramine 25 mg each night. The patient returns in a week and notes that he’s a little better, but the “cough is still there.” Dr. Alveolus increases the diphenhydramine dose to 50 mg, but the patient retreats to the lower dose after 3 days because of intolerable morning drowsiness with the higher dose. He returns complaining of the same symptoms 2 weeks later; the doctor prescribes cetirizine 10 mg (a nonsedating antihistamine). Mr. J fills the prescription but doesn’t return for followup until 6 months later because he feels better. “How did the second pill I prescribed work out for you,” Dr. Alveolus asks. “I think it helped,” Mr. J replies, “but after a while the cough seemed to get better so I stopped taking it. Now it’s worse again, and I need a refill.”

While this typical clinical scenario involves some effort to learn from experience, the approach is rather haphazard and can be improved upon.

N-of-1 Trial Procedures in Contrast to Standard Clinical Practice

What if Mr. J and Dr. Alveolus were to acknowledge their uncertainty and elect to embark on an n-of-1 trial of diphenhydramine versus cetirizine for treatment of chronic cough presumed due to perennial rhinitis? They might agree:

To administer diphenhydramine and cetirizine in a balanced sequence of 7-day treatment intervalsa for a total of eight treatment periods (four periods on diphenhydramine, four periods on cetirizine, 56 days total), with no washout time between treatment periods

To ask the compounding pharmacist to place the medications in identical capsules

To assess benefits using the average of Mr. J’s rating of overall cough severity (1-5 scale) and Mrs. J’s rating of nighttime cough severity (1-5 scale) and harms using a daytime sleepiness scale. At the end of the trial, tradeoffs between benefits (decreased cough) and harms (increased drowsiness) can be examined either implicitly (through mutual deliberation between clinician and patient) or explicitly (using shared decisionmaking tools that assign specific weights to particular benefits and harms).19

Balanced Sequence Assignment

In parallel group RCTs, randomization serves to maximize the likelihood of equivalence between treatment groups (in terms of both known and unknown prognostic factors). In n-of-1 trials, the aim is to achieve balance in the assignment of treatments over time so that treatment effect estimates are unbiased by time-dependent confounders.

Randomization of treatment periods is one way of achieving such balance, but there are others. For example, the treatment sequence AAAABBBB offers no protection against a confounder whose effect on the outcome is linear with time (e.g., a secular trend). The paired design ABABABAB and the singly counterbalanced design ABBAABBA offer better protection against temporally linear confounders but are still vulnerable to nonlinear confounding. The doubly counterbalanced design ABBABAAB defends against both linear secular trends and nonlinear trends.

Balanced assignment (which may be achieved using randomization) helps control for time-varying clinical and environmental factors that could affect the patient’s outcome.21,22 Some, but not all, of these factors may be known to the patient and clinician in advance. For example, Mr. J might have decided to take diphenhydramine on weekends and cetirizine on weekdays. He might then be less prone to notice daytime sleepiness from diphenhydramine because he tends to sleep in on weekends. This would bias his assessment. Randomization (along with blinding) makes it more difficult to guess which treatment has been assigned.

Repetition

For a patient interested in selecting the treatment likely to work best for him or her in the long term, the simplest n-of-1 trial design is exposure to one treatment followed by the other (AB or BA). This simple design allows for direct comparison of treatments A and B and protects against several forms of systematic error (e.g., history, testing, regression to the mean).23 However, one-time exposure to AB or BA offers limited protection against other forms of systematic error (particularly maturation and time-by-treatment interactions) and virtually no protection against random error. To defend against random error (the possibility that outcomes are affected by unmeasured, extraneous factors such as diet, social interactions, physical activity, stress, and the tendency of symptoms to wax and wane over time), the treatment sequences need to be repeated (ABAB, ABBA, ABABAB, ABBAABBA, etc.). In this way, repetition is to n-of-1 trials what sample size is to parallel group RCTs.

Washout and Run-in Periods

The importance of a washout period separating active treatment periods in n-of-1 trials has been fiercely debated.24 A washout period is theoretically important whenever lingering effects of the first treatment might influence outcome measurements obtained while on a subsequent treatment. Carryover effects resulting from insufficient washout will often tend to reduce observed differences between treatments for placebo- controlled trials. However, more complex interactions are possible. For example, if the benefits of a particular treatment wash out quickly but the risks of adverse treatment-related harm persist (think aspirin, which reduces pain over a matter of hours but increases risk of bleeding for up to 7 days), the likelihood of detecting net benefit will depend on the order in which the treatments are administered. Similar issues also apply to slow onset of the new treatment. A possible downside of a washout period is that the patient is forced to spend some time completely off treatment, which might be undesirable for patients who already receive some benefit from both treatments. For practical purposes, washout periods may not be necessary when treatment effects (e.g., therapeutic half-lives) are short relative to the length of the treatment periods. Since treatment half-lives are often not well characterized and vary among individuals, the safest course may be to choose treatment lengths long enough to accommodate patients with longer than average treatment half-lives and to take frequent (e.g., daily) outcome measurements. An alternative to the use of a “physical” washout is the use of “analytic washout,” that is, to address the effects of carryover and slow onset analytically. Further discussion is offered in Chapter 4 (Statistics).

Some n-of-1 investigators have advocated for the use of run-in periods. In parallel group RCTs, a run-in period is a specified period of time after enrollment and prior to randomization that is allotted to further measure a participant’s eligibility and commitment to a study.25 In n-of-1 trials, a run-in period could also be used to differentiate “responders” from “nonresponders” in an open-label (unblinded) situation or to initiate dose-finding.

Blinding

In parallel group RCTs, blinding of patients, clinicians, and outcomes assessors (“triple blinding”) is considered good research practice. These trials aim to generate generalizable knowledge about the effects of treatment in a population. In drug and device trials, the consensus is that it is critical to separate the biological activity of the treatment from nonspecific (placebo) effects. (For a broader view, see Benedetti et al.26) In n-of-1 trials, the primary aim is usually different. Patients and clinicians participating in n-of-1 trials are likely interested in the net benefits of treatment overall, including both specific and nonspecific effects. Therefore blinding may be less critical in this context. Nevertheless, expert opinion tends to favor blinding in n-of-1 trials whenever feasible.

However, just as in parallel group randomized trials, blinding is not always feasible. For example, in trials of behavioral interventions (e.g., bibliotherapy versus computer-based cognitive behavioral therapy for depression), patients will always know what treatment they are on. Furthermore, even for drug trials, few community practitioners have access to a compounding pharmacy that can safely and securely prepare medications to be compared in matching capsules.

Systematic Outcomes Assessment

Evidence is accumulating that careful, systematic monitoring of clinical progress supports better treatment planning and leads to better outcomes. For example, home blood pressure monitoring results in better blood pressure control,27 and “treat-to-target” approaches based on PHQ-9 scores have worked well in depression.28 In n-of-1 trials, systematic assessment of outcomes may well be the single most important design element. There are two issues to consider: (1) what data to collect and (2) how to collect them.

In designing an n-of-1 trial, participants (patients, clinicians, investigators) must first select outcome domains (specific symptoms, specific dimensions of health status, etc.) and then specific measures tapping those domains. In so doing, they must balance a number of competing interests. For most chronic conditions, there are numerous potentially relevant outcomes. These may be condition specific (e.g., pain intensity in chronic low back pain, diarrhea frequency in inflammatory bowel disease) or generic (e.g., health-related quality of life). Clinicians, patients, and service administrators may assign different priorities to different domains. For example, in chronic musculoskeletal pain, the patient may prioritize control of pain intensity or fatigue, the clinician may prioritize daily functioning, and Drug Enforcement Agency officials may prioritize minimizing opportunities for misuse of opiates. The primary purpose of most n-of-1 trials is to assist with individual treatment decisions. Therefore patient preferences are paramount. However, as prescribers of treatment, clinicians are essential partners, and their buy-in is essential.

Once outcome domains have been identified, participants need to pick specific measures. Though measures known to possess high reliability and validity are preferable, sometimes an appropriate pre-existing measure cannot be found. In this case, n-of-1 participants must choose between measures that are well validated but imprecisely targeted to the patient’s goals or new measures that are incompletely validated but a good fit with patient priorities. An interesting compromise is a validated questionnaire (e.g., Measure Your Medical Outcome Profile, or MYMOP) that uses standardized wording and response options applied to the symptoms and concerns of greatest interest to the patient.29

N-of-1 trials can make use of the entire spectrum of data-collection modalities. Traditional approaches include surveys, diaries, medical records, and administrative data. Recent developments in information technology have opened the door to several new approaches, including ecological momentary assessment (EMA) and remote positional and physiologic monitoring. Mobile-device EMA cues the patient to input data at more frequent intervals (e.g., hourly, daily, or weekly) than is typical using traditional survey modalities. Compliance with such devices is higher than with paper diaries.30 When equipped with GPS or movement detection technology (actigraphy), mobile devices can also track patient movements and activities. Ancillary monitoring devices can be connected to mobile devices to monitor heart rate, blood pressure, blood glucose, galvanic skin response, electroencephalographic activity, degree of social networking, vocal stress, etc. Data on the reliability and validity of these measures are currently scant but are accumulating rapidly.31

Statistical Analysis and Feedback for Decisionmaking

Once data are collected, they need to be analyzed and presented to the relevant decisionmakers in a format that is actionable. In the systematic review by Gabler et al.,32 approximately half of the trials reported using a t-test or other simple statistical criterion (44%), while 52 percent reported using a visual/graphical comparison alone. Of the 60 trials (56%) reporting on more than one individual, 26 trials (43%) reported on a pooled analysis. Of these, 23 percent used Bayesian methodology, while the rest used frequentist approaches to combining the data. Guidance on statistical analytic approaches for n-of-1 trials is provided in Chapter 4 (Statistics).

While n-of-1 trials can promote other goals (e.g., increased patient engagement),9 the primary objective is generally to promote better health care decisionmaking for participating patients. The degree to which decisionmaking can be improved will depend on the quality of the data (as influenced by trial design and measurement instruments) and the clarity with which results are communicated to the end-users, especially the patient participating in the trial. There are three fundamental issues n-of-1 trialists should consider. First, should outcomes data be presented item by item (or scale by scale) or as a composite measure? A patient with asthma may be interested in her ability to climb stairs, sleep through the night, and avoid the emergency room. These outcomes could be presented as three separate statistics, graphs, or figures, or they could be combined into a single composite measure that averages the individual components, possibly weighted to reflect the relative importance of the respective components. The advantage of single measures is that they retain clinical granularity and, in and of themselves, are readily interpretable. The disadvantage is that they can be confusing, especially if multiple outcomes are affected differently by the treatments under study. The advantage of composite measures is that they make individual-level decisionmaking more straightforward. If, for a given patient, the Asthma Improvement Index moves in a more positive direction on treatment A than B, the drug of choice is treatment A. The composite measure directly (if arbitrarily) addresses the tradeoff among the components such as benefits and harm, especially when the components respond to the treatments differently. On the other hand, composite outcomes are harder to interpret and may be driven by the most sensitive component (which is not necessarily the most important).

The second issue is how to present the data: as graphics, statistics, or both. Simple graphical analysis can transmit results clearly, but not all formats are equally understandable, particularly to low-numeracy populations.33 In addition, graphical analysis can magnify small differences that a proper statistical analysis would show are likely due to chance. A combined approach may work best, employing statistics to test for stochastic significance (or, using a Bayesian framework, to estimate post-test probabilities) and graphics to lend clarity to the findings.

The third issue is whether to rely solely on the results of the current n-of-1 trial for decisionmaking or to “borrow from strength” by combining current data with the results of previous n-of-1 trials completed by similar patients. The choice will usually be driven by the availability of relevant data and by the ratio of within-patient versus between-patient variance (see Chapter 4 for details). If a similar series of trials has never been conducted, and if few patients have been enrolled in the current series, then decisionmaking rests by default on the results of the current n-of-1 trial alone. If, on the other hand, large numbers of patients have completed similar n-of-1 trials, and if within-patient variance is larger than between-patient variance, then “borrowing from strength” will enhance the precision of the result. Similar considerations influence the decision whether and how to combine current n-of-1 results with results extracted from the existing population evidence base (RCTs and observational studies). Further discussion is presented in Chapter 4 (Statistics).

Opportunities and Challenges

In addition to their potential for enhancing therapeutic precision, n-of-1 trials may offer three broader benefits. First, they may help patients and clinicians recognize ineffective therapies, thus reducing polypharmacy, minimizing adverse effects, and conserving health care resources. For example, if the marginal benefits of a new therapy were shown to be small, patient and clinician might elect to use the nearly equivalent but less costly therapeutic alternative. Second, they may help engage patients in their own care.9 A robust literature supports the premise that increased patient involvement in care is associated with better outcomes.8,34 By helping patients attend to their own outcomes and think critically about treatments, n-of-1 trials can awaken patients’ “inner scientist” and give them a greater stake in the process of clinical care. Third, n-of-1 trials can blur the boundaries between clinical practice and clinical research, making research more like practice and practice more like research. Making research more like practice is desirable to increase the relevance and generalizability of clinical research findings. On the other hand, making practice more like research will create opportunities for developing the clinical evidence base by enhancing systematic data collection on the comparative effectiveness of treatments by real health care professionals treating real patients. As n-of-1 trials become better integrated into practice, the downstream benefits may include:

Patients become more acquainted with the scientific method and in particular the value of rigorous clinical experiments.

Clinicians become more connected to the process of generating clinical evidence, more engaged in clinical research, and potentially more interested in participating in clinical trials.

Practices start collecting data on the relationship between treatments and outcomes and making such data available for use in routine patient care. If leveraged to full advantage, these data could become the linchpin of a “learning health care system” as envisioned by the Institute of Medicine.35

For such benefits to be realized, however, a number of challenges must be overcome. Most importantly, a business case must emerge that leaves patients, clinicians, and health care organizations convinced that increased therapeutic precision afforded by n-of-1 trials is worth the trouble. In addition, institutional ethics boards need to accept n-of-1 trials as an extension of clinical care; statistical procedures for the design and analysis of n-of-1 trials need to be automated into user-friendly tools accessible to clinicians and patients; health informatics systems must be created to support the seamless integration of n-of-1 trials into clinicians’ practices and patients’ lives; and all those concerned with improving the quality of therapeutic decisionmaking need adequate training and support. These topics are taken up in the remainder of the User’s Guide.

Outline for the Rest of the User's Guide

In the rest of this User’s Guide, authors will expand on the themes introduced here.

Chapter 2 addresses human subjects issues germane to n-of-1 trials, in particular how n-of-1 trials are situated on the continuum between clinical care and research and hybrids in between. This chapter also provides guidance for IRB committees considering applications to conduct n-of-1 trials.

Chapter 3 takes on the very practical issue of how much n-of-1 trials cost, how much value they offer, and what factors organizations should consider before constructing or endorsing an n-of-1 trial service.