Diener
HC, Rahlfs
VW, Danesch
U. The first placebo-controlled trial of a special butterbur root extract for the prevention of migraine: reanalysis of efficacy criteria. Eur Neurol. 2004;51(2):89-97.PubMedArticle

ImportanceReanalyses of randomized clinical trial (RCT) data may help the scientific community assess the validity of reported trial results.

ObjectivesTo identify published reanalyses of RCT data, to characterize methodological and other differences between the original trial and reanalysis, to evaluate the independence of authors performing the reanalyses, and to assess whether the reanalysis changed interpretations from the original article about the types or numbers of patients who should be treated.

DesignWe completed an electronic search of MEDLINE from inception to March 9, 2014, to identify all published studies that completed a reanalysis of individual patient data from previously published RCTs addressing the same hypothesis as the original RCT. Four data extractors independently screened articles and extracted data.

Main Outcomes and MeasuresChanges in direction and magnitude of treatment effect, statistical significance, and interpretation about the types or numbers of patients who should be treated.

ResultsWe identified 37 eligible reanalyses in 36 published articles, 5 of which were performed by entirely independent authors (2 based on publicly available data and 2 on data that were provided on request; data availability was unclear for 1). Reanalyses differed most commonly in statistical or analytical approaches (n = 18) and in definitions or measurements of the outcome of interest (n = 12). Four reanalyses changed the direction and 2 changed the magnitude of treatment effect, whereas 4 led to changes in statistical significance of findings. Thirteen reanalyses (35%) led to interpretations different from that of the original article, 3 (8%) showing that different patients should be treated; 1 (3%), that fewer patients should be treated; and 9 (24%), that more patients should be treated.

Conclusions and RelevanceA small number of reanalyses of RCTs have been published to date. Only a few were conducted by entirely independent authors. Thirty-five percent of published reanalyses led to changes in findings that implied conclusions different from those of the original article about the types and number of patients who should be treated.

Introduction

Whether trial investigators should be required to make patient data from randomized clinical trials (RCTs) available for reanalysis is controversial.1- 5 Since reanalyses of raw data from oseltamivir trials led to conclusions different from those in the original trials and subsequent meta-analyses, some authors have argued that a standard of data sharing and reanalysis should be more widely adopted and could have major consequences for individual and public health, and that paying consumers (the public) should have access to complete information about drugs and devices.6

Arguments against accessibility to raw data and reanalyses include potential risk to trial patient confidentiality7; inappropriate dredging of data sets, resulting in spurious findings6; release of commercially sensitive information6; the requirement for a data infrastructure for sharing data and reanalysis8; and “rogue” reanalysis by nonexperts or by analysts who have conflicts of interest, as in the case of the Methane Awareness Resource Group Diesel Coalition that tried to thwart a study showing association of diesel exhaust with cancer outcomes via multiple requests for raw data for reanalysis.9

In this study, we identified published articles that reported reanalyses of patient-level data from RCTs testing the same hypothesis as the original article. We evaluated the authorship of the reanalyses, how the findings compare with those from the original analysis, and whether the reanalysis could modify interpretations from the original article about which patients should be treated.

Methods

Search

We searched MEDLINE from inception to March 9, 2014, for articles reporting reanalyses of previously published RCTs, in which reanalysis was defined as testing of an identical hypothesis (eg, identical population, intervention, comparator, outcome, study design).

We identified articles by using a combination of relevant MeSH terms: (replicat*[title] OR reproduc*[title] OR reapprais*[title] OR re-apprais*[title] OR re-evaluat*[title] OR reevaluat*[title] OR re-assess*[title] OR reassess*[title] OR revis*[title]) OR (re-analysis OR reanalysis OR reanalyzed OR re-analyzed) NOT reproductiv*. We limited articles to English language and clinical trials and excluded meta-analyses that used patient-level data.

Study Selection and Data Extraction

We screened titles and abstracts of identified citations to flag potentially eligible studies and read the full text to identify those that met the eligibility criteria. We developed items for data extraction after review of a random sample of 10 potentially eligible studies. We completed extraction exercises for all data items by using 6 articles until 100% consensus was achieved between 4 extractors, and the remaining articles were then divided among the extractors and 1 author verified all the extracted data.

We extracted information about trial characteristics (participants’ disease/condition, intervention and comparators, and definition or measurement of primary outcome), authorship (countries and overlap in authors or in research group/consortium affiliations), and analyses (differences in methods used, handling of missing data, use of intention-to-treat principle, and whether reanalysis authors identified any errors in the original data set or analysis); 1 or more type of reanalysis may have been performed for the same article. We also assessed public availability of patient-level data, and for reanalyses performed by no authors from the original article or its team or consortium, we contacted the corresponding author of the reanalysis article to clarify whether patient-level data from the RCT were publicly available or whether the authors had to undertake an approval process to obtain the individual patient data, whenever this information was not discernible from the reanalysis article.

We categorized differences between original trial and reanalysis findings as changes in direction and magnitude of treatment effect, changes in statistical significance, and changes that could lead to differences in interpretation about the types or numbers of patients who should be treated with the active intervention, or the newer or more experimental intervention when 2 active interventions were compared.

Statistical Analysis

We describe data as proportions and means or medians as appropriate. To assess changes in magnitude of treatment effect (defined as nonoverlap of CIs with reanalysis), we extracted effect estimates and associated 95% CIs, calculated within-trial differences in treatment effect expressed as standardized mean differences for articles reporting continuous outcomes and relative risks (risk ratios, odds ratios [ORs], risk differences, or hazard ratios [calculated from dividing the median survival times of relevant arms]) for articles reporting dichotomous outcomes, and evaluated whether the CIs overlapped. We used Fisher exact test to compare proportions of reanalyses that recommended a change in the number of patients who should be treated by authorship (overlapping vs independent authors).

We used a significance threshold of .05 and reported 2-tailed P values for this comparison.

We completed all analyses with SPSS version 22.0.0. We did not require approval from a research ethics board because we were using summary data from published trials and reanalyses and did not require individual patient data.

Results

Search Results

We identified 2950 citations in an initial search and screened 2948 for eligibility (Figure). We assessed the full text of 226 and excluded 186. We were unable to retrieve 2 articles to determine eligibility10,11 and were unable to retrieve the original trials10,11 for 2 reanalyses.12,13 Our final sample comprised 36 articles (Appendix A in the Supplement), and because one article included 2 separate reanalyses of different articles,14 our evaluation is based on 37 eligible reanalyses (Figure).

Study Characteristics

Table 1 summarizes the characteristics of the studies (Appendix B in the Supplement). Thirty-one reanalyses (84%) had an overlap of at least 1 author and 32 (86%) were published by the same research group or consortium as the original article. Of the 5 reanalyses from authors entirely independent from those in the original article, individual patient data were publicly available for 2, authors of 2 reanalyses sought and received approval from the original authors, research group, or consortium, and we could not clarify data availability for 1 reanalysis. In 3 instances, patient-level data availability was unclear and we needed to contact the corresponding author of reanalyses to clarify whether those data were publicly available or whether the authors had to undertake an approval process to obtain them.

Twelve of the original RCTs (32%) were published in general medical journals. Conversely, only 3 of the reanalyses (8%) were published in general medical journals (Table 1, Appendix C in the Supplement). Most original studies and reanalyses were completed by authors from Europe (n = 23 and n = 22, respectively) and the United States (n = 19 and n = 18, respectively). The median time between publication of the original trial and its partner reanalysis was 48 months (interquartile range, 23-98 months).

Differences in Methods

There were 46 differences in methods identified in the 37 reanalyses (differences in statistical or analytical methods [n = 18], definition or measurement of outcomes [n = 12], approaches to handling missing data [n = 8], use of intention to treat [n = 2], and other [n = 6]) (Table 2); numbers were comparable when counting what authors identified as the most important difference per reanalysis (statistical or analytical methods [n = 17], definition or measurement of outcomes [n = 11], approaches to handling missing data [n = 2], use of intention to treat [n = 1], and other [n = 6]).

Four reanalyses addressed errors in the data set or analysis of the original article (1 article reporting 2 reanalyses excluded patients who should have been ineligible in the original article,14 1 reanalysis identified misclassified cases in the original article caused by errors at collection of clinical data and by lack of blood sample validation,15 and 1 reanalysis reported a misinterpretation of findings based on assumptions of the original analysis16). All errors were identified by authors from the same group.

Differences in Findings

Fifteen reanalyses (41%) reported only P values without treatment effect sizes, only treatment effects without P values or measures of precision, or effects in units not comparable to those in the original analyses (Appendix Table D in the Supplement). Of 42 comparisons reported in the remaining 22 reanalyses, the direction of treatment effect in the original and reanalysis was the same in 38 cases and different in 4 (in 2 reanalyses a previously non-null treatment effect became null,17- 19 in 1 reanalysis a previously null effect became non-null,20 and in 1 the direction of treatment effect was on the opposite side of the null compared with that in the original21).

We were able to calculate standardized effect differences for 32 comparisons in 22 study pairs (Appendix D in the Supplement) to assess changes with reanalysis in magnitude of treatment effect and found 2 reanalyses with nonoverlapping CIs compared with those of the original trials.

One reanalysis of a trial comparing Holter monitoring with electrophysiologic testing used 2 revised criteria for predicting drug efficacy; there was a decrease in drug efficacy prediction (OR, 0.59; 95% CI, 0.41-0.85) with one of the criteria in the reanalysis compared with that in the original trial (OR, 0.24; 95% CI, 0.16-0.36).22

A second reanalysis motivated by changes in erythropoiesis-stimulating agent recommendations for the hemoglobin level at which to initiate therapy with the agent showed a decrease in benefit of fixed-dose darbepoetin alfa every 3 weeks vs weight-based weekly dosing, using a threshold hemoglobin level of less than 10 g/dL (OR, 0.88; 95% CI, 0.85-0.88) compared with the original trial hemoglobin threshold of less than 11 g/dL (OR, 0.77; 95% CI, 0.76-0.80).23

Two reanalyses showed a loss in statistical significance14,24 and 2 showed a gain.25,26

Thirteen reanalyses (35%) reported a change in findings that implied a difference in interpretation about who should be treated (Table 3). Eight of the 13 changes in interpretation were accompanied by changes in direction of effect or in gain or loss of nominal statistical significance and 5 by changes in size of the treatment effect.

Three studies (8%) implied that different patients should be treated because there was a change in the understanding of the reasons of benefit or the types of patients benefiting more from the treatment. For example, a treatment trial for esophageal varices showed a reduction in mortality but not rebleeding with sclerotherapy, with proportional hazard modeling, whereas its reanalysis suggested a reduction in rebleeding but not mortality, based on a multistage competing risk model.28 One reanalysis (3%) concluding that fewer patients should be treated reversed the conclusion that homeopathic treatment was effective for fibrositis by disaggregating a composite end point comprising pain and sleep.24 Nine reanalyses (24%) were interpreted as showing that more patients should be treated. For example, a trial comparing mycophenolate mofetil and azathioprine in heart transplant patients showed no difference between treatments at preventing growth of intravascular ultrasonographically measured intimal medial thickness, whereas its reanalysis suggested superiority of mycophenolate mofetil when data were matched by site.20

Reanalyses by Different Authors

Only 5 reanalyses (13.5%) were performed by completely independent authors.24,26,31- 33 Three of the 5 used different analytical methods,26,32,33 1 considered the original analysis of a crossover RCT invalid and reanalyzed the first treatment period only,24 and 1 used a different definition for the primary outcome.31 Two of these 5 independent reanalyses did not change the original trial interpretation,32,33 2 suggested that more patients should be treated,26,31 and 1 suggested that fewer patients should be treated compared with the original article.24 We found no statistically significant difference in proportion of reanalyses leading to different conclusions about who should be treated when reanalyses were performed by overlapping vs independent authors (10/32 vs 3/5; OR, 0.30; 95% CI, 0.04-2.11; P = .32).

Discussion

In this review, we identified 37 reanalyses of patient-level data from previously published RCTs (reported in 36 articles). Most of the reanalyses were completed by authors involved in the original trial, and most assessed the effect of different analytical methods or a change of outcome definition on the trial’s estimate of effect. Original RCT data sets were publicly available in 2 of 5 instances when trial data were reanalyzed by independent authors. Five of 42 comparisons in reanalyses resulted in a change in treatment effect, 2 reanalyses resulted in a loss of statistical significance, and 2 resulted in a decrease in estimate of treatment effect. Approximately a third (35%) of the published reanalyses led to changes in findings that implied conclusions different from those of the original article about the types and number of patients who should be treated. We performed a search of MEDLINE from inception to June 19, 2014, using a combination of search terms and review, and found no studies evaluating reanalyses of RCTs. Thus, we believe that our study represents the first empirical evaluation of reanalyses of RCT data and of changes emerging from those reanalyses.

The uncovering of distortion and bias in the reporting of trials of rofecoxib and oseltamivir explains why efforts to improve access to RCT data have attracted substantial interest from researchers, regulators, funders, and pharmaceutical companies.4 Reproducibility is an important step to ensure that the findings of original trials are not distorted, biased, or incomplete.4 Evidence is limited in the current biomedical literature about whether the results of RCTs can be reproduced by independent analysts, perhaps partly because lack of publicly available data sets prevents reanalyses by independent authors.

As a result, there have been increasing calls within the medical community for access to raw data from published trials so that reanalyses can take place. Although some large companies have committed to making trial data available, there is no consensus about the optimal data format, what data ought to be shared, and who can access them and when.4 There have been emerging data-sharing initiatives, including the Yale University Open Data Access project,34 the National Institutes of Health Data Sharing Requirements, and the International Stroke Trial.35 Some journals have also created policies that support public accessibility of data and protocols as prerequisites of publication for some types of research.36 Furthermore, the Office of Science and Technology Policy recently released a memo to develop a plan to support public access to data.8,37 However, a standard of data nonsharing by investigators still remains common.38- 40

Involving authors of the original article in reanalyses may be a condition for providing access to data and may ensure that direct knowledge of study nuances is accounted for in a reanalysis. Involving such authors might also limit the independence of any coauthors to refute initial results if the original authors have commitments to their findings. Reanalyses by independent authors might obviate those conflicts but be equally problematic if they have competing interests. For example, a trial comparing acupuncture and amitriptyline in human immunodeficiency virus–infected patients reported no effect for either intervention on neuropathic pain.41 A reanalysis by authors who were proponents of complementary and alternative medicine analyzed mortality and attrition (not the primary or secondary outcomes in the original article) and concluded that acupuncture had a lower attrition rate and lower (zero) mortality rate, with P = .05 for both comparisons.42 Thus, reanalyses with discrepant results may under some circumstances raise as many questions as the original trial.

Ideally, authors completing the reanalysis should not have conflicting financial, ideological, or political interests.43 At the least, when a reanalysis is completed, authors of the original article should be provided with the opportunity to review and comment on it before publication.43 In our review, we found no statistically significant difference in the proportion of reanalyses resulting in a change in recommendation about the number of patients who should be treated when the reanalysis was conducted by original trial vs independent authors, but there were only 13 reanalyses that could result in a change in recommendation, so the comparison was underpowered to detect a difference and the CIs were wide enough to be inconclusive.

In our evaluation, 65% of reanalyses were successfully reproduced without changing the interpretation of the results, which may be encouraging in that the majority of the trials’ findings and conclusions were reproducible. However, 35% of the published reanalyses could alter the conclusions of the original trial on which or how many patients should be treated. It is difficult to assess whether these changes in trial conclusions led eventually to major changes in clinical practice and, if so, how large these changes were. Clinical practice choices depend only partly on trial evidence, and sometimes multiple additional trials exist that inform the same question. Nevertheless, when contradicting messages exist, it is unclear which of the 2 discrepant articles will have more influence: the original is usually published in more influential journals, but the subsequent reanalysis may be viewed as a more correct appraisal of the data.

Our study has limitations. Authors of reanalyses that led to changes in findings did not always specify how they thought the differences should be interpreted in regard to alterations in who should be treated, so we used subjective judgment to translate the change in findings into categories of changes in interpretation (treat more, fewer, or different patients). Also, we excluded meta-analyses. Authors of meta-analyses using patient-level data may routinely reanalyze data from studies they include, but whether the results of single trials have been verified or contradicted remains unclear because the authors do not typically publish each as a reanalysis, the publication emphasis is on summary results, and many data sets differ from those used in the original articles, eg, they have longer follow-up. Moreover, typically in such meta-analyses, the authors of the original articles also coauthor the meta-analysis, so accounting for trials included in patient-level meta-analyses might not increase the small number of independent reanalyses by different authors. We focused on reanalyses of single trials, but there is increasing interest in reanalyses and meta-analysis of multiple trials on the same topic, as in the case of human bone morphogenetic protein 2.44

We excluded non–English-language trials (n = 3); treatment effect estimates or associated CIs were missing in several published articles, which did not allow fully standardized comparison of effect sizes; the study was underpowered to detect a difference in the proportion of reanalyses resulting in a change in recommendation about the number of patients who should be treated when conducted by original trial vs independent authors; and our search may have missed some articles that were in fact reanalyses but were not named as reanalyses (or replications, reevaluations, reappraisals, reproductions, or related terms) by their authors.

Finally, we cannot exclude the possibility that many other reanalyses might have been performed that were never published, especially those with results and conclusions identical to those of the original article. Authors of confirmatory reanalyses may choose not to publish the results or, alternatively, they may have difficulty publishing their article because many journals may not consider it interesting. Thus, our observed estimate of different conclusions (35%) is probably an overestimate.

Conclusions

A small number of reanalyses of RCTs have been published to date; of these, only a few were conducted by entirely independent authors. Thirty-five percent of published reanalyses led to changes in findings that implied conclusions different from those of the original article about which patients should be treated.

Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were reported.

Funding/Support: Dr Ebrahim reports receiving support from MITACS Elevate and SickKids Restracomp Postdoctoral Awards. Dr Mills reports receiving support via a Canadian Institutes of Health Research Canada Research Chair.

Role of the Funders/Sponsors: Sponsors providing individual financial support to authors did not have a role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Diener
HC, Rahlfs
VW, Danesch
U. The first placebo-controlled trial of a special butterbur root extract for the prevention of migraine: reanalysis of efficacy criteria. Eur Neurol. 2004;51(2):89-97.PubMedArticle