The study got widely covered by the media. The NRC, for which the main author Martijn Katan works as a science columnist, columnist, spent two full (!) pages on the topic -with no single critical comment-[3].
As if this wasn’t enough, the latest column of Katan again dealt with his article (text freely available at mkatan.nl)[4].

I found Katan’s column “Col hors Catégorie” [4] quite arrogant, especially because he tried to belittle a (as he called it) “know-it-all” journalist who criticized his work in a rivaling newspaper. This wasn’t fair, because the journalist had raised important points [5, 1] about the work.

The piece focussed on the long road of getting papers published in a top journal like the NEJM.
Katan considers the NEJM as the “Tour de France” among medical journals: it is a top achievement to publish in this paper.

Katan also states that “publishing in the NEJM is the best guarantee something is true”.

I think the latter statement is wrong for a number of reasons.*

First, most published findings are false[6]. Thus journals can never “guarantee” that published research is true.
Factors that make it less likely that research findings are true include a small effect size, a greater number and lesser preselection of tested relationships, selective outcome reporting, the “hotness” of the field (all applying more or less to Katan’s study, he also changed the primary outcomes during the trial[7]), a small study, a great financial interest and a low pre-study probability (not applicable) .

It is true that NEJM has a very high impact factor. This is a measure for how often a paper in that journal is cited by others. Of course researchers want to get their paper published in a high impact journal. But journals with high impact factors often go for trendy topics and positive results. In other words it is far more difficult to publish a good quality study with negative results, and certainly in an English high impact journal. This is called publication bias (and language bias) [8]. Positive studies will also be more frequently cited (citation bias) and will more likely be published more than once (multiple publication bias) (indeed, Katan et al already published about the trial [9], and have not presented all their data yet [1,7]). All forms of bias are a distortion of the “truth”.
(This is the reason why the search for a (Cochrane) systematic review must be very sensitive [8] and not restricted to core clinical journals, but even include non-published studies: for these studies might be “true”, but have failed to get published).

Indeed, the group of Ioannidis just published a large-scale statistical analysis[10] showing that medical studies revealing “very large effects” seldom stand up when other researchers try to replicate them. Often studies with large effects measure laboratory and/or surrogate markers (like BMI) instead of really clinically relevant outcomes (diabetes, cardiovascular complications, death)

Importantly, the NEJM has the highest proportion of trials (RCTs) with sole industry support(35% compared to 7% in the BMJ) [12] . On several occasions I have discussed these conflicts of interests and their impact on the outcome of studies ([13, 14; see also [15,16] In their study, Gøtzsche and his colleagues from the Nordic Cochrane Centre [12] also showed that industry-supported trials were more frequently cited than trials with other types of support, and that omitting them from the impact factor calculation decreased journal impact factors. The impact factor decrease was even 15% for NEJM(versus 1% for BMJ in 2007)! For the journals who provided data, income from the sales of reprints contributed to 3% and 41% of the total income for BMJ and The Lancet.
A recent study, co-authored by Ben Goldacre (MD & science writer) [17] confirms that funding by the pharmaceutical industry is associated with high numbers of reprint orders. Again only the BMJ and the Lancet provided all necessary data.

Finally and most relevant to the topic is a study [18], also discussed at Retractionwatch[19], showing that articles in journals with higher impact factors are more likely to be retracted and surprise surprise, the NEJM clearly stands on top. Although other reasons like higher readership and scrutiny may also play a role [20], it conflicts with Katan’s idea that “publishing in the NEJM is the best guarantee something is true”.

I wasn’t aware of the latter study and would like to thank drVes and Ivan Oranski for responding to my crowdsourcing at Twitter.

@laikas re: "NEJM publishing is the best guarantee something is true" - Incorrect, of course. Just look at the retractions, etc.

The decision of the editors was based on the failure of at least 10 other studies to confirm these findings and on growing support that the results were caused by contamination. When the authors refused to retract their paper, Science issued an Expression of Concern [2].

In my opinion retraction is premature. Science should at least await the results of two multi-center studies, that were designed to confirm or disprove the results. These studies will continue anyway… The budget is already allocated.

Furthermore, I can’t suppress the idea that Science asked for a retraction to exonerate themselves for the bad peer review (the paper had serious flaws) and their eagerness to swiftly publish the possibly groundbreaking study.

And what about the other studies linking the XMRV to ME/CFS or other diseases: will these also be retracted?
And what happens in the improbable case that the multi-center studies confirm the 2009 paper? Would Science republish the retracted paper?

Thus in my opinion, it is up to other scientists to confirm or disprove findings published. Remember that falsifiability was Karl Popper’s basic scientific principle. My conclusion was that “fraud is a reason to retract a paper and doubt is not”.

This is my opinion, but is this opinion shared by others?

When should editors retract a paper? Is fraud the only reason? When should editors issue a letter of concern? Are there guidelines?

Schekman considers it “an unusual situation to retract a paper even if the original findings in a paper don’t hold up: it’s part of the scientific process for different groups to publish findings, for other groups to try to replicate them, and for researchers to debate conflicting results.”

I don’t have any hard numbers on how often journals ask scientists to retract a paper, only my sense that it is very rare. Author retractions are more frequent, but I’m only aware of a handful of those in a year. I can recall a few other cases in which the authors were asked to retract a paper, but in those cases scientific fraud was involved. That’s not the case here. I don’t believe there is a standard policy that enumerates how such decisions are made; if they exist they are not public.

However, there is a Guideline for editors, the Guidance from the Committee on Publication Ethics (COPE) (PDF) [5]

Ivanoranski, of the great blog Retraction Watch, linked to it when we discussed reasons for retraction.

With regard to retraction the COPE-guidelines state that journal editors should consider retracting a publication if:

they have clear evidence that the findings are unreliable, either as a result of misconduct (e.g. data fabrication) or honest error (e.g. miscalculation or experimental error)

the findings have previously been published elsewhere without proper crossreferencing, permission or justification (i.e. cases of redundant publication)

it constitutes plagiarism

it reports unethical research

According to the same guidelines journal editors should consider issuing an expression of concern if:

they receive inconclusive evidence of research or publication misconduct by the authors

there is evidence that the findings are unreliable but the authors’ institution will not investigate the case

they believe that an investigation into alleged misconduct related to the publication either has not been, or would not be, fair and impartial or conclusive

an investigation is underway but a judgement will not be available for a considerable time

Thus in the case of the Science XMRV/CSF paper an expression of concern certainly applies (all 4 points) and one might even consider a retraction, because the results seem unreliable (point 1). But it is not 100% established that the findings are false. There is only serious doubt……

The guidelines seem to leave room for separate decisions. To retract a paper in case of plain fraud is not under discussion. But when is an error sufficiently established ànd important to warrant retraction?

Apparently retractions are on the rise. Although still rare (0.02% of all publications by the late 2000s) there has been a tenfold increase in retractions compared to the early 1980s (see review at Scholarly Kitchen [6] about two papers: [7] and [8]). However it is unclear whether increasing rates of retraction reflect more fraudulent or erroneous papers or a better diligence. The first paper [7] also highlights that, out of fear of litigation, editors are generally hesitant to retract an article without the author’s permission.

At the blog Nerd Alert they give a nice overview [9] (based on Retraction Watch, but then summarized in one post 😉 ) . They clarify that papers are retracted for “less dastardly reasons then those cases that hit the national headlines and involve purposeful falsification of data”, such as the fraudulent papers of Andrew Wakefield (autism caused by vaccination). Besides the mistaken publication of the same paper twice, data over-interpretation, plagiarism and the like, the reason can also be more trivial: ordering the wrong mice or using an incorrectly labeled bottle.

Still, scientist don’t unanimously agree that such errors should lead to retraction.

“I don’t give a fig what any journals might wish to enact as a policy to overcompensate for their failures of the past.In my view, a correction suffices” (provided that search engines like Google and PubMed make clear that the paper was in fact corrected).

Drug Monkey has a point there. A clear watermark should suffice.

However, we should note that most papers are retracted by authors, not the editors/journals, and that the majority of “retracted papers” remain available. Just 13.2% are deleted from the journal’s website. And 31% are not clearly labelled as such.

Summary of how the naïve reader is alerted to paper retraction (from Table 2 in [7], see: Scholarly Kitchen [6])

Watermark on PDF (41.1%)

Journal website (33.4%)

Not noted anywhere (31.8%)

Note appended to PDF (17.3%)

PDF deleted from website (13.2%)

My conclusion?

Of course fraudulent papers should be retracted. Also papers with obvious errors that invalidate the conclusions.

However, we should be extremely hesitant to retract papers that can’t be reproduced, if there is no undisputed evidence of error.

All retracted papers (and papers with major deficiencies and shortcomings) should be clearly labeled as such (as Drugmonkey proposed, not only at the PDF and at the Journal website, but also by search engines and biomedical databases).

One day the scientific community will trade the static print-type approach of publishing for a dynamic, adaptive model of communication. Imagine a manuscript as a living document, one perhaps where all raw data would be available, others could post their attempts to reproduce data, authors could integrate corrections or addenda….

The reliability of science is increasingly under fire. We all know that media often gives a distorted picture of scientific findings (i.e. Hot news: Curry, Curcumin, Cancer & cure). But there is also an ever growing number of scientific misreports or even fraud (see bmj editorial announcing retraction of the Wakefield paper about causal relation beteen MMR vaccination and autism). Apart from real scientific misconduct there are Ghost Marketing and “Publication Bias”, that makes (large) positive studies easier to find than those with negative or non-significant result.
Then there are also the ever growing contradictions, that makes the public sigh: what IS true in science?

With Iaonnides as an editor, a new PLOS-one paper has recently been published on the topic [1]. The authors Gonon, Bezard and Boraud state that there is often a huge gap between neurobiological facts and firm conclusions stated by the media. They suggest that the misrepresentation often starts in the scientific papers, and is echoed by the media.

In a (non-systematic) review of 360 ADHD articles Gonon et al. [1] found two studies with “obvious” discrepancies between results and claimed conclusions. One paper claimed that dopamine is depressed in the brain of ADHD patient. Mitigations were only mentioned in the results section and of course only the positive message was resonated by the media without further questioning any alternative explanation (in this case a high baseline dopamine tone). The other paper [3] claimed that treatment with stimulant medications was associated with more favorite long-term school outcomes. However the average reading score and the school drop-outs did not differ significantly between treatment and control group. The newspapers also trumpeted that “ADHD drugs help boost children’s grades” .

2. Fact Omission

To quantify fact omission in the scientific literature, Gonon et al systematically searched for ADHD articles mentioning the the D4 dopamine receptor (DRD4) gene. Among the 117 primary human studies with actual data (like odds ratios), 74 articles state in their summary that alleles of the DRD4 genes are significantly associated with ADHD but only 19 summaries mentioned that the risk was small. Fact omission was even more preponderant in articles, that only cite studies about DRD4. Not surprisingly, 82% of the media articles didn’t report that the DRD4 only confers a small risk either.
In accordance with Ioannidis findings [2] Gonon et al found that the most robust effects were reported in initial studies: odds-ratios decreased from 2.4 in the oldest study in 1996 to 1.27 in the most recent meta-analysis.

3. Extrapolating basic and pre-clinical findings to new therapeutic prospects

Animal ADHD models have their limitations because investigations based on mouse behavior cannot capture the ADHD complexity. Analysis of all ADHD-related studies in mice showed that 23% of the conclusions were overstated. The frequency of this overstatement was positively related with the impact factor of the journal.

Again, the positive message was copied by the press. (see Figure below)

”]Discussion

The article by Gonon et al is another example that “published research findings are false” [ 2], or at least not completely true. The authors show that the press isn’t culprit number one, but that it “just” copies the overstatements in the scientific abstracts.

The merit of Gonon et al is that they have extensively looked at a great number of articles and at press articles citing those articles.

The first type of misrepresentation wasn’t systematically studied, but types 2 and 3 misrepresentations were studied by analyzing papers on a specific ADHD topic obtained by a systematic search.

One of the solutions the authors propose is that “journal editors collectively reject sensationalism and clearly condemn data misrepresentation”. I agree and would like to add that the reviewers should check that the summary actual reflects the data. Some journals already have strict criteria in this respect. It striked me that the few summaries I checked were very unstructured and short, unlike most summaries I see. Possibly, unstructured abstracts are more typically for journals about neuroscience and animal research.

The choice of the ADHD-topics investigated doesn’t seem random. A previous review[4], written by Francois Gonon deals entirely with “the need to reexamine the dopaminergic hypothesis of ADHD” . The type 1 misrepresentation data stem from this opinion piece.

The putative ADHD-DRD4 gene association and the animal studies, taken as examples for type 2 and type 3 misrepresentations respectively, can also be seen as topics of the “ADHD is a genetic disease” -kind.

Gonon et al clearly favor the hypothesis that ADHD is primarily caused by environmental factors . In his opinion piece he starts with saying:

This dopamine-deﬁcit theory of ADHD is often based upon an overly simplistic dopaminergic theory of reward. Here, I question the relevance of this theory regarding ADHD. I underline the weaknesses of the neurochemical, genetic, neuropharmacological and imaging data put forward to support the dopamine-deﬁcit hypothesis of ADHD. Therefore, this hypothesis should not be put forward to bias ADHD management towards psychostimulants.

I wonder whether it is fair of the authors to limit the study to ADHD topics they oppose to in order to (indirectly) confirm their “ADHD has a social origin” hypothesis. Indeed in the paragraph “social and public health consequences” Gonon et al state:

Unfortunately, data misrepresentation biases the scientific evidence in favor of the first position stating that ADHD is primarily caused by biological factors.

I do not think that this conclusion is justified by their findings, since similar data misrepresentation might also occur in papers investigating social causes or treatments, but this was not investigated. (mmm, a misrepresentation of the third kind??)

I also wonder why impact factor data were only given for the animal studies.

Gonon et al interpret a lot, also in their results section. For instance, they mention that 2 out of 60 articles show obvious discrepancies between results and claimed conclusions. This is not much. Then they reason:

Our observation that only two articles among 360 show obvious internal inconsistencies must be considered with caution however. First, our review of the ADHD literature was not a systematic one and was not aimed at pointing out internal inconsistencies. Second, generalization to other fields of the neuroscience literature would be unjustified

Furthermore they selectively report themselves. The Barbaresi paper [3], a large retrospective cohort, did not find an effect on average reading score and school drop-outs, but it did find a significantly lowered grade retention, which is -after all- an important long-term school outcome.

Direct-to-consumer (DTC) genetic testing refers to genetic tests that are marketed directly to consumers via television, print advertisements, or the Internet. This form of testing, which is also known as at-home genetic testing, provides access to a person’s genetic information without necessarily involving a doctor or insurance company in the process. [definition from NLM’s Genetic Home Reference Handbook]

Almost two years ago I wrote about 23andMe (23andMe: 23notMe, not yet), a well known DTC company, that offers a genetics scan (SNP-genotyping) to the public ‘for research’, ‘for education’ and ‘for fun’:

“Formally 23andMe denies there is a diagnostic purpose (in part, surely, because the company doesn’t want to antagonize the FDA, which strictly regulates diagnostic testing for disease). However, 23andme does give information on your risk profile for certain diseases, including Parkinson”

In another post Personalized Genetics: Too Soon, Too Little? I summarized an editorial by Ioannides on the topic. His (and my) conclusion was that “the promise of personalized genetic prediction may be exaggerated and premature”.The most important issue is that predictive power to individualize risks is relatively weak.Ioannidis emphasized that despite the poor evidence, direct to consumer genetic testing has already begun and is here to stay. He proposed several safeguards, including transparent and thorough reporting, unbiased continuous synthesis and grading of the evidence and alerting the public that most genetic tests have not yet been shown to be clinically useful.

And now these “precautionary measures” actually seem to happen.Last week the FDA sent 5 DTC-companies, including 23andMe a letter saying “their tests are medical devices that must receive regulatory approval before they can be marketed.”(ie. see NY-times article).

“Premarket review allows for an independent and unbiased assessment of a diagnostic test’s ability to generate test results that can reliably be used to support good health care decisions,”

These letters are part of an initiative to better explain the FDA’s actions by providing information that supports clinical medicine, biomedical innovation, and public health,” (May 19 New England Journal of Medicine commentary, source: see AMED-news)

Although it doesn’t look like the tests will be taken from the market, 23andMe does take a quite a rebellious attitude: one of its directors called the FDA “appallingly paternalistic.”

Many support this view: “people have the right to know their own genetic make-up”, so to say. Furthermore as discussed above, 23andMe denies that their genetic scans are meant for diagnosis.

In my view the latter is largely untrue. At least 23andMe suggests that knowing a scan does tell you something about your risks for certain diseases.
However, the risks are often not that straightforward. You just can’t “measure” the risk of a multifactorial disease like diabetes by “scanning” a few weakly predisposing genes. Often the results are given in relative risk, which is highly confusing. In her TED-talk the 23andMe directorAnne Wojcicki said her husband Sergey Brin (Google), had a 50% chance of getting Parkinson, but his relative risk (RR, based on the LRRK2-mutation, which isn’t the most crucial gene for getting Parkinson) varies from 20% to 80% , which means that this mutation increases his absolute risk of getting Parkinson from 2-5% (normal chance) to 4-10% at the most. (see this post).

Furthermore, as reported by Venture in Nature (October 8, 2009): For seven diseases, 50% or less of the predictions of two companies agreed across five individuals(i.e. for one disease: 23andMe : RR 4.02, and Navigenics RR: 1.25). On the other hand *fun* diagnoses could lead to serious concern in, or wrong/unnecessary decisions (removal of ovaries, changing drug doses) by patients.

There are also concerns with regard to their good-practice standards, as 23andMe just flipped a 96-wells plate of costumer DNA (see Genetic Futurefor a balanced post), which upset a mother noticing that her son didn’t have compatible genes. But lets assume that proper precautions will prevent this to happen again.

There are also positive aspects: results of a preliminary study showed that people who find out they have high genetic risk for cardiovascular disease are more likely to change their diet and exercise patterns than are those who learn they have a high risk from family history. (Technology Review: Genetic Testing Can Change Behavior).

Furthermore, people buy those tests themselves and, indeed, there genes are their own.

However, I agree with Dr. Gutierrez of the FDA saying: “We really don’t have any issues with denying people information. We just want to make sure the information they are given is correct.” (NY-Times). The FDA is putting the consumers first.

However, it will be very difficult to be consistent. What about total body scans in normal healthy people, detecting innocent incidentilomas? Or what about the controversial XMRV-tests offered by theWhittemore Peterson Institute (WPI) directly to CFS- patients? (see these posts) And one step further (although not in the diagnostic field): the ineffective CAM/homeopathic products sold over the counter?

I wouldn’t mind if these tests/products would be held up to the light. Consumers should not be misled by the results of unproven or invalid tests, and where needed should be offered the guidance of a healthcare provider.

But if tests are valid and risk predictions correct, it is up to the “consumer” if he/she wants to purchase such a test.

Also recommendable: the post “FDA to regulate genetic tests as “devices”” at PHG Foundation. This post highlights that simply trying to classify the complete genomic testing service as “a device” is inadequate and will not address the difficult issues at hand. One of the biggest issues is that, while classifying DTC genetics tests as devices is certainly appropriate for assessing their analytical validity and direct safety, it does not and cannot provide an assessment of the service, thus of the predictions and interpretations resulting from the genome scans. Although standard medical testing has traditionally been overseen by professional medical bodies, the current genomic risk profiling tests are simply not good enough to be used by health care services. (see post)

Personalized Medicine is the concept that managing patient’s health should be based on the individual patient’s specific characteristics instead of on the standards of care. Often the term ‘personalized medicine’ is restricted to the use of information about a patient’s genotype or gene expression profile to further tailor medical care to an individual’s needs (see [1])

This so called Personalized Genetics is a beautiful concept. Suppose you could predict people’s risk for a certain disease and be able to prevent it by encouraging positive lifestyle changes and/or start a tailor made therapy, suppose you could predict which patients would respond to an intervention and which people should avoid certain medications. Wouldn’t that be wonderful and much better than treating everybody the same way only to benefit a few?

Research like the human genome project and recent advances in genomics research have boosted progress in the discovery of susceptibility genes and fueled expectations about opportunities of genetic profiling for personalizing.

But are the high expectations justified?

For personalized genetics to be (clinically) effective it must fulfill the following requirements (based on [2]):

Clear and strong association of the gene (expression) variant with the susceptibility to a disease or the outcome of a treatment

..as determined in good quality studies with a sufficient number of events (if the events are rare you cannot accurately predict the outcome)
(1-3 make up the predictive performance)

The availability of effective interventions or effective alternatives

Cost-effectiveness

According to an editorialin the January issue of the Annals of Internal Medicine “the promise of personalized genetic prediction may be exaggerated and premature” [2]. This is especially true for many complex diseases, where 1 variant alone is unlikely to make the difference.

The editorial is written by John Ioannidis, who is a professor at the University of Ioannina School of Medicine in Greece and has an adjunct appointment at Tufts University School of Medicine in Boston. His research focuses on meta-analysis and evidence-based medicine with special emphasis on research methodology. Ioannidis is a brilliant researcher, epidemiologist and inspiring lecturer (I have attended a lecture of him once at a Cochrane Colloquium). Therefore I would urge everyone interested in personalize genetics to read his editorial.

Here I will give a summary of the editorial entitled “Personalized genetic prediction: too limited, too expensive, or too soon”[2].The editorial summarizes two publications in the same issue of the journal [3,4] and gives an overview of the literature.

Ioannidis stresses that recent studies into the predictive performance of common genetic traits have several shortcomings, including an often weak design with few events (*3)*, incomplete comparison with traditional risk factors (*2) and exaggerated prediction of effects because of the models used (*1).

To date, the genotypic information does not substantially improve the prediction of future cardiovascular disease (CVD), prostate cancer and type 2 diabetes beyond traditional risk factors. In the case of age-related macular degeneration, genetic information does increase the ability to predict progression to the disease. However the predictive power to individualize risks remains relatively weak.

Indeed, a recent paper published in PLOS [5] reinforces that a strong association between single nucleotide polymorphisms (SNPs) and a multifactorial disease like age-related macular degeneration, diabetes type 2, CVD and Crohn disease may be very valuable for establishing etiological hypotheses, but do not guarantee effective discrimination between cases and controls and are therefore of little clinical value yet. For further details with regard to the methods used to determine clinical validity of genetic testing you are encouraged to read the entire (free) paper [5].

Likewise, the study of Paynter et al [3] reviewed by Ioannidis, shows that genetic variation in chromosome 9p21.3 (rs10757274) was strongly and consistently associated with incident CVD in a cohort of white women, but did not improve on the discrimination or classification of predicted risk achieved with traditional risk factors, high-sensitivity C-reactive protein, and family history of premature myocardial infarction. Thus “knowing a patient’s rs10757274 genotype would not help a clinician make better preventive or therapeutic decisions to reduce future risk for heart disease”.
This holds also true for many other potentially causal single SNPs: they have a relatively small effect on their own. Complex diseases are probably the result of numerous gene-gene and gene-environment interactions, which may differ from one population to the other and only explain a small proportion of the trait variance.

Even improved prediction (*1-3) does not necessarily make a predictive test useful. The prevalence of the disease is also an important determinant, i.e. people with high risk gene variants for a rare disease may have a significant higher-than average risk, but still a negligible probability of developing the disease.

Clinical utility of the genetic prediction also depends on the availability of effective interventions (*4) and the cost effectiveness (*5). Another paper in the same Ann. Intern. Med. issue [4] shows that although CYP2C9 and VKORC1 strongly predict the chance of bleeding as a side effect of warfarin treatment, genotype-guided dosing appeared not to be cost-effective for patients requiring initiation of warfarin therapy. Piquant detail: The FDA has approved this kind of genetic testing, although there is no good evidence that such genotyping does in fact reduce the risk of hemorrhage in everyday clinical practice. Such knowledge would require large well designed RCT’s.

Ioannidis emphasizes that despite the poor evidence, genetic testing and commercial use (direct to consumer genetic testing) have already begun and are here to stay. He proposes several safeguards, including transparent and thorough reporting, unbiased continuous synthesis and grading of the evidence and alerting the public that most genetic tests have not yet been shown to be clinically useful. He concludes the editorial as follows:

Helping patients and physicians to decide when to do genetic tests will be a tough task because neither knows much about the rapidly emerging field of genomics. We need to learn more about what our genome can tell us and, more important, what it cannot tell us.