Blog Archives

In a recent post, I discussed a panel discussion on May 14, 2011, at the American Heart Association Quality of Care and Outcomes Research in Cardiovascular Disease and Stroke conference. The discussion addressed lessons from experiences with three drugs that were withdrawn or greatly restricted because they caused cardiovascular (CV) harm — rofecoxib (Vioxx), rosiglitzone (Avandia) and sibutramine (Meridia). I summarized the introduction by Sanjay Kaul and the presentations by Steve Nissen and Milton Packer. In this post I will discuss the presentations by statistician Dean Follmann of National Institute of Allergy and Infectious Diseases, NIH, and Ellis Unger of the Center for Drug Evaluation and Research, FDA.

Follmann’s presentation was similar to one he gave at the July 2010 joint meeting of the Endocrinologic and Metabolic Drugs Advisory Committee and Drug Safety and Risk Management Advisory Committee that was held to discuss Avandia. Follmann discussed the hierarchy of study designs, with randomized controlled trials (RCTs) that are double blind superiority trials being at the top. In such a design, randomization ensures that the groups are similar and double blinding ensures that the investigators can’t favor one arm over another. In addition, in a superiority trial the incentives encourage good study conduct because sloppiness (e.g. missing data, loose inclusion criteria, lack of adherence) makes it more difficult to show that the drug is effective. At the next level of reliability, according to Follman, are RCT noninferiority trials and meta-analyses. In a noninferiority trial, the goal is to conclude that a drug is not “unacceptably worse” than a comparator. In Follmann view, the incentives in a noninferiority trial “encourage sloppiness,” since sloppiness will tend to make the two arms more similar and thus meet the goal of noninferiority. (The RECORD trial was a noninferiority trial and was used to assess the safety of Avandia.) A meta-analysis is a quantitative synthesis of RCTs. In Follmann’s view, the quality of evidence of a meta-analysis is a bit less than a RCT, because (1) there may be unpublished trials that are not available for inclusion in the meta-analysis, (2) studies may be heterogeneous in population, endpoints, and comparators, and (3) the decisions on how to conduct the meta-analysis (e.g., what to include, how to analyze, endpoint definition) are made with knowledge of the potential safety signal. For example, to counter the Nissen-Wolski and FDA Avandia meta-analyses, which used myocardial infarction (MI) as the endpoint, GlaxoSmithKline chose a wider endpoint of serious and nonserious ischemia, resulting in a smaller hazard ratio. In addition, GSK used a “very unconventional and some would say illegitimate method of analyzing the data,” according to Follmann. Follmann also stated that it was a “revelation” to him to learn from Nissen’s presentation that GSK had done previous meta-analyses that had similar results as the Nissen-Wolski meta-analysis.

Follmann stated that the next study type in the hierarchy is observational studies. Because, observational studies are not randomized, drug choice may be based on patient characteristics, doctor preference, and unquantifiable factors. Statistical adjustment is done, but the result is less reliable than a RCT. Below observational studies are the FDA’s Adverse Event Reporting System (AERS) and data collected for other purposes, such as data collected by HMOs or the Centers for Medicare & Medicaid Services (CMS). In summary, Follmann stated that assessing a post marketing safety signal is difficult. RCTs are the best data source but are not always available.

Ellis Unger’s first remark was that Nissen had a “retrospectoscope in his back pocket” and was being a “Monday morning quarterback” with respect to the FDA’s actions concerning Vioxx and Avandia. He pointed out that the FDA has to make decisions in real time, which is not so easy, and he is not convinced that the FDA did the wrong thing, based on what it knew at the time. He does agree with the ultimate outcome for Vioxx, Avandia and Meridia.

With respect to Vioxx, Unger stated that at the time of approval it was known that there were associations between Vioxx and hypertension and edema, but in the preapproval trials there were no differences with respect to MI and stroke. The VIGOR trial showed a hazard ratio of 1.94 for the composite endpoint of death, MI and stroke. For non-fatal MI, the hazard ratio was 4.51 (p < 0.05). He does not believe the VIGOR data were enough that Vioxx should have been removed from that market at that point (2000). Unger next discussed the APPROVe trial, which was stopped two months early due to an excess in serious thrombotic events in the Vioxx group (RR 1.92), and resulted in the voluntary removal of Vioxx from the market. In the wake of Vioxx’s withdrawal, the FDA held a joint meeting of the Arthritis and Drug Safety and Risk Management Advisory C0mmittees on February 16-18, 2005 to discuss Cox-2 inhibitors. Unger summarized the data presented at the meeting as follows: (1) “all Cox-2-selective agents seem to increase CV risk (no ranking)” and (2) “available data do not support greater CV risk for selective agents as compared to non-selective agents.” After the meeting, the FDA added labeling warning of the potential for increased risk of CV thrombotic events to all NSAIDs.

With respect to rosiglitazone, Unger stated that the evidence of cardiovascular risk is “neither robust nor conclusive” and “remains an open question,” while acknowledging that there were “multiple signals of concern from various sources of data, without reliable evidence to refute risk.” He stressed the limitations of the Nissen/Wolski meta-analysis, including that the results were based on a relatively small number of events. Interestingly, Unger said that the FDA was more worried about the finding for cardiovascular death (odds ratio 1.64, p = 0.06) than the finding for MI (odds ratio 1.43, p = 0.03), even though the result for CV death was not statistically significant. Unger views the ADOPT and DREAM trials as being neutral on cardiovascular death, with both showing trends for increased MI.

With respect to the RECORD trial, Unger criticized the open-label design and possibility of ascertainment bias but also stated that the results for all-cause death are “unlikely to be influenced by bias,” and showed a favorable trend for rosiglitazone. With respect to MI, the results were “inconclusive,” as neither the GSK nor the FDA analysis showed a statistically significant increase in MIs. Unger stated that viewed as a means to test the two hypotheses generated by the Nissen/Wolski meta-analysis — rosiglitazone causes MI and increases the risk of CV death — RECORD “does not substantiate the findings of the Nissen/Wolski meta-analysis.” (For more on Unger’s views on RECORD, see his slides from the 2010 advisory committee meeting on rosiglitazone here). Finally, Unger noted that the David Graham epidemiological study of Medicare patients did not find a statistically significant higher risk of MI with rosiglitazone as compared to pioglitazone. Why didn’t the FDA take rosiglitazone off the market instead of leaving it on the market with restricted access? Unger cited conflicting data on the existence and magnitude of risk, the need for detailed re-adjudication and analysis of RECORD, the fact that some patients are currently taking rosiglitazone and want to stay on it even with knowledge of the risk.

With respect to sibutramine (Meridia), a weight loss drug that is an inhibitor of norepinephrine, serotonin and dopamine reuptake, Unger noted that at approval in 1997 the drug was known to increase blood pressure and heart rate and result in miscellaneous ECG changes, but the adverse effects were deemed “monitorable.” The European regulators, however, required a post-marketing cardiovascular outcomes study. This was the SCOUT trial, a large randomized, double-blind, placebo-controlled trial in obese patients over age 55 with a history of coronary artery disease, peripheral vascular disease, or stroke and/or Type 2 diabetes with at least one other risk factor. The primary endpoint was a composite of CV death, resuscitation after cardiac arrest, non-fatal MI and non-fatal stroke, which occurred in 11.4% of the patients on sibutramine and 10.0% of the patients on placebo (HR 1.16, p = 0.02). Following this trial, sibutramine was removed from the market in the U.S. and Europe.

Unger noted that post-marketing safety used to focus on rare, severe events that were detectable from spontaneous reporting. In recent years, there has been greater interest in small increases in common but serious events, such as MI, stroke, and CV death. Quantification of common risks is challenging with longer, larger studies required. If the drug is for a symptomatic condition such as depression or pain, it is difficult to keep patients from dropping out of the trial. It is difficult to interpret the results of a trial when there have been a lot of dropouts.

Unger stated that when the FDA reviews clinical trial data they are interested in imbalances in virtually any safety issue so we “always see safety signals because we look at 150 adverse events.” They have to consider a number of issues in assessing causality: whether there is a plausible mechanism of action, whether it has been observed in other related drugs, whether there is a dose-response relationship, etc.

Comment: I think the problem of post approval safety is not entirely solvable, because there will always be safety signals that crop up after drugs are approved. However, I am in sympathy with Dr. Nissen’s view that safety signals should be investigated and acted on as early as possible, and preferably before approval.

Also, on the topic of Vioxx specifically, I suggest the following for further reading:

A panel discussion on May 14, 2011, at the American Heart Association Quality of Care and Outcomes Research in Cardiovascular Disease and Stroke conference addressed lessons from three drugs — rofecoxib (Vioxx), rosiglitzone (Avandia) and sibutramine (Meridia) — that were found post approval to increase cardiovascular risk and subsequently either withdrawn or severely restricted. The panelists were Steven Nissen of the Cleveland Clinic, Milton Packer of University of Texas Southwestern Medical Center, Dean Follmann of National Institute of Allergy and Infectious Diseases, NIH, and Ellis Unger of the Center for Drug Evaluation and Research, FDA and the moderator was Sanjay Kaul of Cedars-Sinai Medical Center.

In his introductory remarks, Kaul emphasized the asymmetry in the evidence base for assessing efficacy and safety in drug approval. Efficacy is assessed pre-approval by randomized controlled trials (RCTs) with prespecified, adjudicated endpoints. If a safety signal emerges in these efficacy trials, the adverse events are not prespecified and the studies are often not adequately powered to reliably determine risk. Few RCTs are conducted to assess safety pre-approval. Thus, safety is assessed post-approval with meta-analyses, observational databases, the FDA’s Adverse Event Reporting System (AERS), or RCTs that involve limited exposure and/or narrow populations. Thus, the evidence for determining efficacy is often much superior to the evidence for determining safety.

Steven Nissen outlined “critical lessons learned” from rofecoxib and rosiglitazone experiences. First, post-approval studies and spontaneous AE reporting are ineffective at detecting increased risk of common sources of morbidity and mortality, such as cardiovascular disease. Second, for the cardiovascular hazards of rofecoxib and rosiglitazone, strong signals suggesting harm appeared early, but were missed or actively concealed. Third, dedicated post-approval safety studies take many years, and are vulnerable to manipulation, mischief, and flaws in study design or conduct. Thus, the last line of defense against unsafe drugs is often the drug approval process, because “once the genie gets out the bottle, it is very hard to put it back.”

In the case of rofecoxib, which was approved in 1999, Nissen said the safety signal emerged the following year when the VIGOR trial was published. Buried in the “General Safety” section was a sentence stating that “Myocardial infarctions were less common in the naproxen group than in the rofecoxib group (0.1 percent vs. 0.4 percent; 95% confidence interval for the difference, 0.1 to 0.6 percent; relative risk 0.2; 95% confidence interval 0.1 to 0.7)” (emphasis added). Nissen called this phrasing “diabolical” and noted that “no one saw this.” Moreover, the table and Kaplan-Meier curves for thrombotic events were omitted from the manuscript. After the table showing the number of events and the Kaplan-Meier curves were made available in connection with an FDA advisory committee meeting, Nissen and colleagues published the data in JAMA, “creating a furor and lots of slings and arrows, but it didn’t do a thing,” according to Nissen. Vioxx sales continued to grow and ultimately a total of 105 million prescriptions were written, exposing 20 million Americans to the drug. In 2004, the APPROVe study was stopped by the Data Safety Monitoring Board when an excess of thrombotic events became evident, leading to the drug’s withdrawal. Nissen described how the APPROVe study too was published in a misleading way, making it appear that there was an 18-month delay before the excess risk became evident. Documents that were disclosed in litigation revealed a previously undisclosed intention-to-treat analysis. The ITT analysis showed an early hazard with no 18 month delay. In Nissen’s view, these misleading trial publications demonstrate that the medical community just can’t trust that industry-sponsored clinical trial data will be published in a way that is not misleading.

Nissen conducted a similar analysis of the history of rosiglitazone, arguing that a safety signal was evident in the pre-approval trials, in which there was an excess of ischemic myocardial events for rosiglitazone, as well as a worrisome 18.6% increase in LDL. Nissen also described corporate misconduct, including the intimidation of a leading diabetes researcher, buried data and GSK meta-analyses that were conducted in 2005 and 2006, before the 2007 Nissen/Wolski meta-analysis. The GSK meta-analyses showed an increased risk of ischemic myocardial events and were shared with the FDA but not with physicians or patients. (For a detailed rosiglitazone chronology, see Nissen’s 2010 editorial, “The rise and fall of rosiglitazone,” as well as Nissen’s slides from the July 2010 FDA advisory committee meeting on rosiglitazone). Of note, the Nissen/Wolski meta-analysis was only made possible because a settlement with the New York attorney general’s office had required GSK to make all its clinical trial data available. Nissen/Wolski found the data on a GSK website and published their meta-analysis. Without the disclosure required by the settlement with New York state, the meta-analysis would not have been possible, as 35 out of 42 clinical trials were unpublished. As for the RECORD trial, Nissen described it as a texbook example of “how not to perform a safety study.” The trial was completely unblinded to patients and physicians and there was unrestricted availability of treatment codes to the contract research organization and GlaxoSmithKline (GSK). In addition, the study leadership removed silent heart attacks (10 to 5, rosiglitazone vs. control) from the database after analyzing the data. Nonetheless, Nissen believes, based on the reanalysis of the RECORD data by the FDA’s Thomas Marciniak (see Marciniak’s slides) that RECORD didn’t show, as argued by GSK, that rosiglitazone was safe; “it demonstrated that the drug was unsafe.” The lesson Nissen believes we should learn from Vioxx, Avandia and Meridia is that “you’ve got to stop these things at the approval process, and when early safety signals are seen, it requires aggressive regulatory action, at the very least demanding that well-conducted safety trials be done. In these three cases, that didn’t happen. Drugs stayed on the market too long and too many people were harmed.”

Milton Packer gave a presentation that was a greatly condensed version of one he gave in February 2005 at the FDA advisory committee meeting on Cox-2s (slides here; transcript here). He emphasized the difficulty of interpreting observed differences in the frequency of events when the number of events is small. The difficulty is that in an efficacy trial, the trial is sized for efficacy, not safety. Where the number of events is small, the point estimates will be extremely imprecise and the confidence intervals will be wide. Even if the result is statistically significant and the effect is biologically plausible, it is often not possible to be certain the effect is real. Packer gave the example of a Vioxx meta-analysis conducted by Peter Juni and colleagues and published in The Lancet in December 2004, after the withdrawal of Vioxx. Based on a cumulative meta-analysis, the authors concluded that by the end of 2000 the relative risk was 2.30 (95% CI 1.22-4.33, p=0.01) and that Vioxx should have been withdrawn at that time. Packer points out that this analysis was based on only 52 events. He believes that was not enough events to draw reliable conclusions. As evidence, Packer gave examples of small pilot trials that gave results that were not confirmed when larger definitive trials were done. Packer doesn’t disagree with the actions that were ultimately taken with respect to Vioxx, Avandia and Meridia, he just questions that it was possible to know what the risks were early on.

In Part 2 of this post, I will discuss the presentations by Dean Follman and Ellis Unger.