Recently Storebø et al (2015a) published a Cochrane systematic review on the efficacy and tolerability of methylphenidate (MPH) in children and adolescents for the treatment of attention deficit hyperactivity disorder (ADHD), a summary of which was then published in the BMJ (Storebø et al, 2015b).

In recent years, numerous systematic reviews and meta-analyses have consistently demonstrated the high degree of efficacy of stimulants, such as MPH, for the treatment of ADHD. Consequently, those making treatment recommendations, such as the UK NICE guidelines (NICE, 2008), have identified MPH as a first-line treatment for school-age children with moderate to severe ADHD.

Hence, the new Storebø et al review has generated controversy as it has challenged the quality of this prior evidence-base and suggests that there is significant uncertainty about the effectiveness of methylphenidate for the treatment of ADHD.

Estimates suggest that ADHD affects around 2-5% of school-aged children and young people.

Methods

The authors report a Cochrane systematic review and meta-analysis of randomised controlled trials (RCTs) of methylphenidate for the treatment of children and adolescents (age 18 years or younger) with ADHD.

Major databases were searched and 185 trials were included in the review:

38 were parallel-group trials

147 were cross-over trials.

The primary outcome reported in the BMJ paper was ADHD symptoms rated by teachers. Other secondary outcomes included general behaviour and quality of life.

Risk of bias

Risk of bias in trials was recorded using the Cochrane ‘risk of bias’ (RoB) tool. The authors added an additional domain of ‘vested interests’ to the RoB tool identifying trials with industry involvement. In effect, every trial that was funded by a manufacturer of the drug, authored by someone with ties to a manufacturer or where this was unclear was considered to have a ‘high risk’ of bias; irrespective of the scientific quality of the trial, i.e. even if it had a low risk of bias in the other methodological RoB domains.

Quality of evidence was summarised using the GRADE approach.

All trials with clear or unclear links to industry were considered to have a high risk of bias, irrespective of their actual scientific quality.

Results

The Storebø et al Cochrane review and meta-analysis reports a large effect size (SMD -0.77) for teacher reported ADHD symptoms, indicating clinically meaningful improvement with methylphenidate. These findings are in line with previous systematic reviews.

Furthermore, the authors report additional benefits of methylphenidate in improving:

General behaviour (SMD -0.68, 95% CI -0.78 to -0.60)

Quality of life (SMD 0.61, 95% CI 0.48 to 0.80)

However, Storebø et al cast serious doubt on quality of the evidence and certainty of these effects. They identify 96.8% of included trials as ‘high risk of bias trials’ and assessed all outcomes as ‘very low quality’ using GRADE.

Conclusions

The authors conclude in their BMJ paper that:

Given the risk of bias in the included studies and the very low quality of outcomes, the magnitude of effects [of methylphenidate] is uncertain and the strength of evidence is insufficient to guide practice.

The accompanying BMJ commentary by Mina Fazel (2015) states that the author’s:

Findings are potentially important but confusing for clinicians and affected families, thanks to poor overall quality of the evidence.

Strengths and limitations

Storebø et al have conducted a large and comprehensive systematic review and meta-analysis of the effects of methylphenidate in children and adolescents with ADHD. The main limitation of the study concerns the idiosyncratic methods used to assess study bias and quality of evidence. These methods, although reported under the aegis of a Cochrane review, deviate significantly from standard Cochrane methodology and result in an exaggeration of study bias and excessive downgrading of the quality of evidence.

The most misleading and potentially damaging part of the paper relates to way the quality of studies is assessed. Storebø et al adopt the GRADE approach, which includes the Cochrane risk of bias tool (RoB). However, the authors deviate from standard Cochrane methodology in two important ways:

Firstly, their idiosyncratic broadening of RoB domains to include ‘vested interests’. The Cochrane Handbook recommends that rather than presuming bias on the basis of industry involvement it is preferable to assess whether there are any reasons to believe that vested interests may have led to bias in each trial individually. If this were the case, then the bias would be coded under the standard RoB tool domains. Adding an additional ‘vested interests’ category runs the risk of ‘double counting’ bias.

Secondly, in summarising risk of bias, the authors combined the ‘uncertain’ and ‘high’ risk of bias categories and reported these all as ‘high’ risk of bias trials. Using this approach, the authors categorised 96.8% of trials as ‘high risk of bias trials’ when the standard Cochrane RoB methodology would categories just over one third (37%; 69/185) of trials as being in the high risk of bias category. For the 19 trials contributing to the primary outcome of teacher rated ADHD symptoms; less than a third of trials (31%; 6/19) had a high risk of bias using the standard Cochrane RoB tool. If the author’s ‘vested interests’ domain is included in the RoB tool, then 63% (12/19) would have a high risk of bias. These figures are significantly lower than the 96.8% of studies having high risk of bias reported as the headline figure in the BMJ paper.

Turning to GRADE, the author’s misapplied the rules for downgrading (and upgrading) evidence as recommended in the Cochrane Handbook.

Firstly, risk of bias is just one of five factors contributing to a GRADE assessment. A high risk of bias would typically lead to downgrading by one or occasionally two points. The authors downgraded all studies assessing ADHD (teacher) outcomes by two points and then by an additional point (total downgrade by 3 points) for study heterogeneity (I²). Thresholds for the interpretation of I² can be misleading, since the importance of inconsistency depends on several factors. The Cochrane Handbook gives a rough guide to interpretation of I² as follows: ‘0% to 40%: might not be important; 30% to 60%: may represent moderate heterogeneity (depending on the magnitude and direction of effects)’. In this review, I² = 37% for the 19 trials contributing to the primary outcome is of questionable significance and would not be expected to result in a downgrading using GRADE.

Finally, GRADE can also be upgraded where there are large effect sizes and narrow confidence intervals: (SMD) -0.77, 95% confidence interval (CI) -0.90 to -0.64, represents a large effect estimated with moderate to high precision. In summary, the authors were excessively stringent in downgrading all GRADE scores by 3 points (all GRADE scores were ‘very low’ quality).

Summary

The outcomes reported in this review show that methylphenidate is a highly effective, safe and generally well-tolerated treatment for ADHD, with findings similar to those of previous meta-analyses.

However, the idiosyncratic approach used by the authors for assessing quality of evidence deviates significantly from the standard Cochrane method and as a result, exaggerates the risk of bias assessment and excessively downgrades the quality of evidence.

Crucially, the authors themselves showed in the full Cochrane review (but did not report this in the BMJ paper) that ‘vested interests’ bias did not materially affect the results. Therefore, the author’s interpretation of the results and conclusion that the ‘strength of evidence is insufficient to guide practice’ is misleading and potentially dangerous as it could undermine the confidence of practitioners, children and parents in what is an effective and generally safe treatment.

The conclusions of this Cochrane review are misleading and potentially dangerous.

Chris Hollis is Professor of Child & Adolescent Psychiatry at the University of Nottingham and Consultant in Developmental Neuropsychiatry with Nottinghamshire Healthcare NHS Trust. Before moving to Nottingham, Chris trained in psychiatry at the Maudsley Hospital and Institute of Psychiatry, where he completed his PhD. In addition to his role as Clinical Director of the NIHR Healthcare Technology Co-operative (HTC) in mental health and neurodevelopmental disorders (MindTech), Chris is Director of the Centre for ADHD and Neurodevelopmental Disorders Across the Lifespan (CANDAL) and leads the NIHR CLAHRC NDL Children and Young People's research theme.
His areas of research interest include ADHD, Tourette syndrome, early onset schizophrenia and the development and implementation of digital technologies to enhance assessment and monitoring of mental health disorders. Over the last 5 years Chris has a research income of over £2.5 million. Chris was a member of the RAE 2008 panel for psychiatry, clinical psychology and neuroscience. He has served on numerous academic advisory and funding bodies including the MRC and Mental Health Foundation.
Chris is lead clinician for the Developmental Neuropsychiatry Service at QMC, with special interests in lifespan neurodevelopmental disorders including ADHD, Tourette's, ASD, child and adolescent-onset psychoses and psychopharmacology. Since 2007, Chris has developed and led the Adult ADHD Clinic at QMC. Chris was a member of the NICE Guideline Development Group (GDG) for ADHD (2005-8), NICE ADHD Quality Standard Advisory Committee (2012-13) and chaired the NICE GDG for schizophrenia and psychosis in children and young people (2011-13).

I have now, thanks to you Sarah. They have agreed to respond within a week. They’ve also said they don’t agree with the blog! I hope that the wider Cochrane community might respond as well. Cheers, André

I agree with the prof Hollis that Storebø et al. might have been overly strict with applying GRADE criteria for heterogeneity and effect size. But anyone who have used GRADE know that this could be debated in length as GRADE criteria are quite subjective, despite efforts of building a reliable grading system.
Now, I think that Hollis failed to provide three crucial points of information in his summary and interpretation:
1. What time to outcome are we talking about here? It is sub-par to exclude this piece of information when talking about psychoactive drugs because most drug trials concern only short-term effects.
2. What about all studies with uncertain risk of bias? Focussing only on the proportion of studies with high risk of bias fails to acknowledge the problems that arise due to uncertain risk of bias. Hollis argues that Storebø et al. exaggerate the proportion of high-risk studies by combining unclear and high risk of bias–but how does he value the uncertain studies? Should we not acknowledge uncertainty in outcomes when there are uncertain risk of bias?
3. Highly relevant to the issue of vested interests, it surprises me that The Mental Elf chooses to have Hollis, funded by pharma, to write this piece. His credentials notwithstanding he apparently conducts research on ADHD funded by Shire (they sell ADHD medication)–without acknowledging this in the conflicts of interest on this page. This may explain the spin he puts on the results. The test of effects in vested-interests trials vs. others is definitely not sufficient to discard the risk that conflicts of interest have influenced the outcomes of vested-interests trials.

Strengths and limitations
Chris Hollis writes:
Storebø et al have conducted a large and comprehensive systematic review and meta-analysis of the effects of methylphenidate in children and adolescents with ADHD.

RESPONSE
We thank you for these very positive comments.

Chris Hollis writes:
The main limitation of the study concerns the idiosyncratic methods used to assess study bias and quality of evidence. These methods, although reported under the aegis of a Cochrane review, deviate significantly from standard Cochrane methodology and result in an exaggeration of study bias and excessive downgrading of the quality of evidence. The most misleading and potentially damaging part of the paper relates to way the quality of studies is assessed.

RESPONSE
We thank for the interests shown in our results. We fully understand that Chris Hollis due to his long involvements advocating methylphenidate for ADHD patients, for instance as part of NICE panels, may feel uncomfortable with the results of our systematic review. However, we find the wording “deviate significantly”, “exaggeration”, and “misleading” may be taking it a bit too far from the truth. We shall try to elaborate on this in our responses below.

RESPONSE
Hollis makes this statement as if the use of the GRADE system including the Cochrane risk of bias tool to evaluate the quality of the evidence is something we have imposed on the systematic review. First, GRADE is a comprehensive evaluation of evidence, taking not only risk of bias into consideration, but also imprecision, heterogeneity, directness (generalizability), and risks of publication bias. Second, GRADE ought to be a central part of Cochrane systematic reviews and has for years been endorsed by Cochrane as well as Cochrane’s Methodological Expectations of Cochrane Intervention Reviews (MECIR) guidelines (1, 2). These guidelines specify the methodological expectations one can reasonably expect when one wants to conduct a contemporary assessment of any piece of medical evidence. Third, anyone not using GRADE runs the risk of erring dramatically when trying to assess medical evidence.
References
1.Higgins JPT, Green S. The Cochrane Handbook for Systematic Reviews of Interventions, Version 5.1.0. In The Cochrane Collaboration; 2011. Available from http://www.cochrane-handbook.org.
2. Jackie Chandler, Rachel Churchill, Julian Higgins, Toby Lasserson and David Tovey, Methodological Expectations of Cochrane Intervention Reviews (MECIR), Methodological standards for the conduct of new Cochrane Intervention Reviews, Version 2.3, 02 December 2013, Cochrane Library
Chris Hollis writes: However, the authors deviate from standard Cochrane methodology in two important ways:
Firstly, their idiosyncratic broadening of RoB domains to include ‘vested interests’. The Cochrane Handbook recommends that rather than presuming bias on the basis of industry involvement it is preferable to assess whether there are any reasons to believe that vested interests may have led to bias in each trial individually. If this were the case, then the bias would be coded under the standard RoB tool domains. Adding an additional ‘vested interests’ category runs the risk of ‘double counting’ bias.

RESPONSE
Hollis assumes that assessing the risks of bias due to vested interests introduces ‘double counting’ of bias. We are not sure on what evidence Hollis is making this statement. Andreas Lundh and colleagues have shown that there are many subtle mechanisms through which sponsorship and conflict of interest may influence intervention effects on outcomes [1]. They also demonstrated that vested interests per se were enough to lead to overestimation of benefits and underestimation of harms, even when all other bias domains were assessed as low risk of bias.[1] Moreover, the AMSTAR tool for methodological quality assessment of systematic reviews includes funding and conflicts of interest as a domain (http://amstar.ca/). For years, there has been an intense discussion on risks of bias from vested interests within Cochrane. For more information, please see editorials by Bero 2013 [2] and Sterne 2013 [3] as well as the commentary by Gøtzsche 2014 [4].
Hollis’ comment seems to be his means to try to undermine our results. As we shall see below, this attack does not seem to contain much ‘bite’ as our results would have been comparable even if we had ignored industry bias and vested interests.

Chris Hollis writes: Secondly, in summarising risk of bias, the authors combined the ‘uncertain’ and ‘high’ risk of bias categories and reported these all as ‘high’ risk of bias trials. Using this approach, the authors categorised 96.8% of trials as ‘high risk of bias trials’ when the standard Cochrane RoB methodology would categories just over one third (37%; 69/185) of trials as being in the high risk of bias category. For the 19 trials contributing to the primary outcome of teacher rated ADHD symptoms; less than a third of trials (31%; 6/19) had a high risk of bias using the standard Cochrane RoB tool. If the author’s ‘vested interests’ domain is included in the RoB tool, then 63% (12/19) would have a high risk of bias. These figures are significantly lower than the 96.8% of studies having high risk of bias reported as the headline figure in the BMJ paper.

RESPONSE
In our Cochrane protocol we planned to consider trials with one (or more) unclear or high risk of bias domain as trials with high risk of bias [1]. Assessing trials with unclear domains alongside trials with high risk of bias domains is standard in all meta-epidemiological studies that have used these procedures [2-7]. These studies are the evidence underpinning the Cochrane risk of bias tools. The studies provide substantial evidence showing that randomised clinical trials with unclear or high risk of bias tend to overestimate benefits and underestimate harms when compared to trials with low risk of bias [2-7]. The bias risk domains allocation sequence generation, allocation concealment, blinding of participants and clinicians, blinding of outcome assessors, incomplete outcome data, selective outcome reporting, and industry funding (or vested interests) have all been shown to be of particular importance [2-9]. When one or more of these bias domains are with high risk of bias due to the domain being unclear or frankly at high risk of bias there are often overestimates of beneficial effects in the order of 10% to 30% risk reductions or underestimates of harmful effects in the order of 10% to 30% risk reductions.
A randomised clinical trial should, therefore, only be classified as overall ‘low risk of bias’ if all of the bias domains are assessed as ‘low risk of bias’ [1-8]. This does not seem ‘too strict’ as there is substantial evidence showing that trials with unclear or high risk of bias significantly overestimate benefits and underestimates harms [1-8]. In all the meta-analyses of our review, all of the randomised clinical trials were assessed as having one or more ‘unclear risk of bias’ or ‘high risk of bias’ domains. There were no trials with only the ‘vested interest bias’ domain assessed as ‘unclear risk of bias’ or ‘high risk of bias’. Accordingly, disregarding this domain would have resulted in exactly the same level of risk of bias for all trials.
We have used evidence-based bias methods and GRADE methodologies to evaluate the 185 included randomised clinical trials. These methodologies lead us to warn against the previous way methylphenidate has been conceived as a very effective treatment. Due to the risks of bias in all 185 trials, we are not sure that the small effects we observe are real or caused by bias. Furthermore, if there should be small benefits of the methylphenidate intervention, are these benefits able to balance the risks of using methylphenidate? These risks encompass (a) adverse events and serious adverse events (we are presently reviewing about 350 observational studies reporting adverse events and serious adverse events in connection with methylphenidate and here there are several reports of serious adverse events); (b) the lack of developing and accessing other interventions that may benefit patients due to the belief “we are having a treatment that benefits”, the so-called ‘sleeping pillow effect’; and (c) the costs of the medication.

Chris Hollis writes: Turning to GRADE, the author’s misapplied the rules for downgrading (and upgrading) evidence as recommended in the Cochrane Handbook. Firstly, risk of bias is just one of five factors contributing to a GRADE assessment. A high risk of bias would typically lead to downgrading by one or occasionally two points. The authors downgraded all studies assessing ADHD (teacher) outcomes by two points and then by an additional point (total downgrade by 3 points) for study heterogeneity (I²). Thresholds for the interpretation of I² can be misleading, since the importance of inconsistency depends on several factors. The Cochrane Handbook gives a rough guide to interpretation of I² as follows: ‘0% to 40%: might not be important; 30% to 60%: may represent moderate heterogeneity (depending on the magnitude and direction of effects)’. In this review, I² = 37% for the 19 trials contributing to the primary outcome is of questionable significance and would not be expected to result in a downgrading using GRADE.

RESPONSE
We did downgrade the evidence according to GRADE with two points for risk of bias as we considered this risk to be serious. We also downgraded GRADE by one point for inconsistency giving the GRADE ‘very low quality evidence’. The I2 was 37% and this may represent moderate heterogeneity as Hollis points out. Hollis should note that we are in fact only able to include 19 out of the 185 trials that we included. Inconsistency or heterogeneity is not very well estimated in an individual meta-analysis and tend to increase as more evidence from larger trials makes estimating more precise [1,2].
At the end of the day, whether one calls these trials as representing ‘very low quality of evidence’ or just ‘low quality of evidence’ would lead to exactly the same conclusions for the present and the future: we need to treat the results with critical disbelief and we need more research on the topic.
References:
1. Ioannidis JPA, Interpretation of tests of heterogeneity and bias in meta-analysis. Journal of Evaluation in Clinical Practice 2008;14:951-957
2.Rücker G, Schwarzer G, Carpenter JR, Schumacher M. Undue reliance on I2 in assessing heterogeneity may mislead. BMC Medical Research Methodology 2008;8:79

Chris Hollis writes: Finally, GRADE can also be upgraded where there are large effect sizes and narrow confidence intervals: (SMD) -0.77, 95% confidence interval (CI) -0.90 to -0.64, represents a large effect estimated with moderate to high precision. In summary, the authors were excessively stringent in downgrading all GRADE scores by 3 points (all GRADE scores were ‘very low’ quality).

RESPONSE
We do not believe that it is reasonable to upgrade for a large effect size with high precision. First, we are not sure which ‘high effect’ Hollis is referring to? As there is risk of bias in all trials we do not know anything about the true magnitude of the effect. Second, that something seems estimated with high precision does not make the result any closer to an unknown ‘true’ effect. Moreover, it is important to understand that even when there is statistical significance for the difference between the group that received methylphenidate versus the placebo group it is not possible to state anything about the true magnitude of the effect size when the evidence is of very low quality. Bias tends to overestimate beneficial intervention effects and to underestimate harmful effects (please see above). Our GRADE assessments of the quality of the evidence underpins that we are dealing with a quality of evidence, which may be open to serious changes in the future, based on better conducted and larger trials with longer follow-up periods.

RESPONSE
Once again, as described above, risk of bias due to vested interests had no effect on our assessment of overall risk of bias of the individual trials. All trials with potential bias due to vested interests had other bias risk domains signaling: take care, here overestimation of benefits and underestimations of harms may be taking place!
In our Cochrane review we have described that there were in fact no trials with low risk of bias due to the deblinding issue. We need trials in which methylphenidate is evaluated against a nocebo in order to secure an objective and unbiased assessment. If such trials show significant superiority that could overpower the risks of adverse effects then we would be willing to recommend the treatment. But based on the current evidence only individuals disregarding the risks of bias are able to recommend the drug.

Chris Hollis writes: The outcomes reported in this review show that methylphenidate is a highly effective, safe and generally well-tolerated treatment for ADHD, with findings similar to those of previous meta-analyses. However, the idiosyncratic approach used by the authors for assessing quality of evidence deviates significantly from the standard Cochrane method and as a result, exaggerates the risk of bias assessment and excessively downgrades the quality of evidence. Crucially, the authors themselves showed in the full Cochrane review (but did not report this in the BMJ paper) that ‘vested interests’ bias did not materially affect the results. Therefore, the author’s interpretation of the results and conclusion that the ‘strength of evidence is insufficient to guide practice’ is misleading and potentially dangerous as it could undermine the confidence of practitioners, children and parents in what is an effective and generally safe treatment. See more at: http://www.nationalelfservice.net/mental-health/adhd/methylphenidate-for-adhd-have-cochrane-got-it-wrong-this-time/#sthash.k7L0KOgd.dpuf

RESPONSE
We refer the reader to our previous responses above and ask them to consider: who is really dangerous here?

I would like to thank Ole Storebo for his very detailed and constructive response to my blog. Storebo raises a number of important points which I would like to respond to:

1. Ole Storebo writes: Chris Hollis assumes that assessing the risk of bias introduces ‘double counting’. We are not sure on what evidence Hollis is making this statement.

RESPONSE

My argument is not that knowledge of the funding source of trials is unimportant when assessing evidence – it is that the RoB tool is not the appropriate place to include it. As Jonathan Sterne states [1] “problems associated with industry-funded trials lie mainly in two areas: selective reporting of outcomes and whole studies, and problematic choice of comparator”. Hence, if bias occurs in these domains in industry studies, including a separate ‘vested interest’ domain will result in ‘double counting’ in the RoB tool.

Storebo also makes the point that magnitude of effects may be overestimated in industry studies even where risk of bias is ‘low’[2]. The answer in these circumstances is not to infer ‘unmeasured’ methodological bias in every industry funded study (however well conducted it may be) by adding an additional domain in the RoB tool. A much better solution to investigating the potential moderating effect of industry funding is to perform a subgroup analysis comparing industry vs. non-industry funded studies. Punja et al. adopted this approach in the recent Cochrane review of amphetamines for children with ADHD [3] . They reported no differences in outcomes between industry-funded and publically funded studies. I would like to invite Storebo to report a similar subgroup analysis from their review based on the 19 trials contributing to the primary outcome (teacher rated ADHD symptoms).

Storebo points out that ‘For years, there has been intense discussion on the risk of bias for vested interests in Cochrane’. Clearly, this internal debate continues without resolution or consensus. In these circumstances, it is surely premature for Storebo to unilaterally change the RoB tool in their review before the publication of the next edition of the Cochrane Handbook. Changing the instrument makes it impossible to compare the quality of the evidence across other systematic reviews that have used the standard Cochrane RoB tool.

2. Ole Storebo writes: Hollis make this statement as if the use of the GRADE system including the Cochrane risk of bias (RoB) tool to evaluate the quality of evidence is something we have imposed on the systematic review.

RESPONSE

Storebo misrepresents my position with respect to using GRADE to assess quality of evidence in systematic reviews. GRADE has an important and accepted place in the assessment of the quality of evidence – but the significant limitations of GRADE, in particular the problem of subjective judgments, lack of reliability and consistency in ratings and potential for reviewer bias should also be understood in light of Storebo’s review and conclusions. The main concern is that GRADE is being applied differently across Cochrane reviews – making it very hard for readers to have confidence in how these reviews describe quality of evidence.
The key finding in Storebo’s review (which underpins their main conclusions) is the very low GRADE quality rating of the methylphenidate trials in ADHD. As I have argued in my blog, Storebo’s GRADE ratings excessively downgraded the quality of evidence – with triple downgrading to ‘very low quality’ for all outcomes. In GRADE, Storebo double downgraded for risk of bias and single downgraded for study heterogeneity.

Firstly, double downgrading for risk of bias (RoB) alone is very unusual – and wasn’t justified in Storebo’s review where the majority (47%) of the 19 studies selected for the primary outcome studies have a summary RoB rating of ‘unclear’ [using the summary risk of bias rating definition in the Cochrane Handbook (2011), Chapter 8: Table 8.7a].
Crucially, the approach taken by Storebo to GRADE was very different to Epstein et al.’s recent Cochrane review of methyphenidate in adults with ADHD [4]. To quote Epstein’s abstract: “Most included studies were judged to have unclear risk of bias for most categories. However, as all studies were randomized, double blind, and placebo-controlled and, in general, did not contain factors that significantly decreased the quality of the body of evidence, the quality of evidence was assessed as ‘high’ for most outcomes according to the GRADE (Grades of Recommendation, Assessment, Development, and Evaluation) approach. For one outcome-inattentiveness-most information came from studies at unclear risk of bias, and so the quality of evidence for this outcome was judged as ‘moderate’.”

Clearly, Storebo and Epstein applied different rules and interpretations to GRADE and the RoB tool. Unlike Storebo, Epstein’s Cochrane review didn’t combine the ‘uncertain’ and ‘high’ RoB categories, nor did they downgrade GRADE ratings based on only moderate heterogeneity.

Secondly, downgrading for study inconsistencies should only occur in GRADE where there is ‘serious’ evidence of heterogeneity. As Storebo noted, heterogeneity of 37% is only ‘moderate’ and hence shouldn’t have been downgraded by a point in GRADE. Again, there are inconsistencies with other recent Cochrane reviews. Punja et al. [3] downgraded by one level due to presence of significant statistical heterogeneity; I² >50%.

If Storebo et al. had followed the guidance in the Cochrane Handbook with respect to GRADE, and the approach of other recent Cochrane reviews of stimulants in ADHD, they should have only downgraded evidence by a maximum of one point to ‘moderate’ quality rather than ‘very low’ quality. Storebo’s review is an ‘outlier’ with respect to other similar systematic reviews of ADHD treatments (including the 2008 NICE Guideline CG72) that have used the GRADE assessment of quality of evidence.

In summary, the inconsistencies and idiosyncrasies in the use of GRADE and the RoB tool leave the reader with very little confidence in the results reported by Storebo with respect to the quality of evidence.

Chris Hollis writes:
I would like to thank Ole Storebo for his very detailed and constructive response to my blog. Storebo raises a number of important points which I would like to respond to:

Response:
We (Ole Jakob Storebø and Christian Gluud) thank Chris Hollis very much for the positive comments!

Chris Hollis writes:
My argument is not that knowledge of the funding source of trials is unimportant when assessing evidence – it is that the RoB tool is not the appropriate place to include it. As Jonathan Sterne states [1] “problems associated with industry-funded trials lie mainly in two areas: selective reporting of outcomes and whole studies, and problematic choice of comparator”. Hence, if bias occurs in these domains in industry studies, including a separate ‘vested interest’ domain will result in ‘double counting’ in the RoB tool.¨

Storebo also makes the point that magnitude of effects may be overestimated in industry studies even where risk of bias is ‘low’[2]. The answer in these circumstances is not to infer ‘unmeasured’ methodological bias in every industry funded study (however well conducted it may be) by adding an additional domain in the RoB tool. A much better solution to investigating the potential moderating effect of industry funding is to perform a subgroup analysis comparing industry vs. non-industry funded studies. Punja et al. adopted this approach in the recent Cochrane review of amphetamines for children with ADHD [3]. They reported no differences in outcomes between industry-funded and publically funded studies. I would like to invite Storebo to report a similar subgroup analysis from their review based on the 19 trials contributing to the primary outcome (teacher rated ADHD symptoms).
Storebo points out that ‘For years, there has been intense discussion on the risk of bias for vested interests in Cochrane’. Clearly, this internal debate continues without resolution or consensus. In these circumstances, it is surely premature for Storebo to unilaterally change the RoB tool in their review before the publication of the next edition of the Cochrane Handbook. Changing the instrument makes it impossible to compare the quality of the evidence across other systematic reviews that have used the standard Cochrane RoB tool.
1. Jonathan Sterne. Why the Cochrane risk of bias tool should not include funding source as a standard item[editorial]. Cochrane Database of Systematic Reviews 2013;(12): 10.1002/14651858.ED000076
2. Lundh A. et al. Industry sponsorship and research outcome. Cochrane Database of Systematic Reviews 2013; (12): 10.1002/14651858.MR00033.pub2
3. Punja S, Shamseer L, Hartling L, Urichuk L, Vandermeer B, Nikles J, Vohra S. Amphetamines for attention deficit hyperactivity disorder (ADHD) in children and adolescents. Cochrane Database of Systematic Reviews 2016; 2: CD009996.

Response:
As stated earlier there are different views on this. We do believe that there are problems associated with industry-funded trials. The AMSTAR tool for methodological quality assessment of systematic reviews includes funding and conflicts of interest as a domain (http://amstar.ca/). Andreas Lundh and colleagues have shown that there are many subtle mechanisms through which sponsorship and conflict of interest may influence intervention effects on outcomes (1). Lundh and colleagues also demonstrated that vested interests per se were enough to lead to overestimation of benefit and underestimation of harm, even when all other domains were assessed as low risk of bias (1). There are many other ways that the industry may bias trials results than through the traditional and well accepted bias domains, e.g., through ‘creative’ and selective statistical analyses and through spinning (2, 3, 4, 5, 6, 7). Accordingly, assessment of bias from vested interests has been endorsed (8, 9) and used for years by the Cochrane Hepato-Biliary Group (10).

We reiterate that the results of our review would have been the same even if we had disregarded the issue of vested interests.

We have conducted the requested subgroup analysis comparing those studies with high risk of vested interest bias to those with low risk of vested interest bias on the outcome ADHD symptoms-teacher rated. The effect size of methylphenidate for the 14 trials with high risk of vested interest bias was SMD -0.86 [95% confidence interval (CI) -0.99 to -0.72] compared to SMD -0.50 [95% CI -0.69 to -0.31] in the 5 trials with low risk of vested interest bias. Test for subgroup differences is Chi² = 8.67, df = 1, P = 0.003. So even in this small sample we see a significant difference.

In the referred trial by Punja et al. (11) they performed subgroup analysis comparing trials funded by industry to trials funded by public organisations. In the ‘total ADHD symptoms score-parent rated’ they included five industry funded trials (one cross-over trial and four parallel group trials) and one public funded cross-over trial. They used end-of-period data from the cross-over trials without taking the unit of analysis error into account. All the included trials had several domains with ‘high risk of bias’. This may make this comparison erratic. Moreover, they had lack of power with only one trial in the ’public group’.

We recommend Hollis to read the essay by John P Ioannidis just published in Journal of Clinical Epidemiology (12). Ioannidis writes: ‘‘I am not against the industry, quite the opposite, entrepreneurship is crucial for translation, development, and growth. However, corporations should not be asked to practically perform the assessments of their own products. If they are forced to do this, I cannot blame them, if they buy the best advertisement (i.e., ‘‘evidence’’) for whatever they sell. Clinical investigators flock to try to get coauthorship in multicenter trials, meta-analyses, and powerful guidelines to which they contribute little of essence. Vested interests dictate preemptively large segments of the research agenda and its evidence-based aura which is further propagated in professional societies and large conferences (12)”. We wholeheartedly agree with the points of view expressed by Ioannidis.

Chris Hollis writes:
Storebo misrepresents my position with respect to using GRADE to assess quality of evidence in systematic reviews. GRADE has an important and accepted place in the assessment of the quality of evidence – but the significant limitations of GRADE, in particular the problem of subjective judgments, lack of reliability and consistency in ratings and potential for reviewer bias should also be understood in light of Storebo’s review and conclusions. The main concern is that GRADE is being applied differently across Cochrane reviews – making it very hard for readers to have confidence in how these reviews describe quality of evidence.
The key finding in Storebo’s review (which underpins their main conclusions) is the very low GRADE quality rating of the methylphenidate trials in ADHD. As I have argued in my blog, Storebo’s GRADE ratings excessively downgraded the quality of evidence – with triple downgrading to ‘very low quality’ for all outcomes. In GRADE, Storebo double downgraded for risk of bias and single downgraded for study heterogeneity.
Firstly, double downgrading for risk of bias (RoB) alone is very unusual – and wasn’t justified in Storebo’s review where the majority (47%) of the 19 studies selected for the primary outcome studies have a summary RoB rating of ‘unclear’ [using the summary risk of bias rating definition in the Cochrane Handbook (2011), Chapter 8: Table 8.7a].

Response:
We do not think we misinterpret Hollis’ positions.
We do believe that downgrading two point for bias is justified by the evidence. There is substantial evidence showing that randomised clinical trials with high risk of bias tend to overestimate benefits and underestimate harms (1-7). The bias risk domains of allocation sequence generation, allocation concealment, blinding of participants and clinicians, blinding of outcome assessors, incomplete outcome data, selective outcome reporting, and industry funding have been shown to be of particular importance (1-8).
A randomised clinical trial should, therefore, only be classified as overall ‘low risk of bias’ if all of the bias domains are assessed as ‘low risk of bias’ (1-8). This does not seem ‘too strict’ as there is a substantial evidence showing that trials with unclear or high risk of bias significantly overestimate benefits and underestimate harms (1-8). Most of the randomised clinical trials in our meta-analyses were assessed as having one or more domains with ‘high risk of bias’. There were few trials with only the ‘vested interest bias’ domain assessed as ‘high risk of bias’.
Furthermore, methylphenidate is associated with a number of common adverse events which may have led to loss of blinding in placebo controlled trials, making all trials at high risk of bias. It is likely that there have been a several more non-serious adverse events in many of the included trials as many patients typically experienced more than one adverse event. Further-more, for methodological reasons, we only used dichotomous outcomes reflecting the number of participants affected by the event per the total number of participants. This means that the increased risk of non-serious adverse events is in fact much higher than calculated in our Cochrane review. If blinding is broken in just 20% or 30% of patients on methylphenidate, then this bias alone can explain the small but significant findings.
Finally, GRADE is providing the framework on which to assess the quality of the overall evidence. GRADE does not influence the results per se. So, Hollis is welcome to call all the evidence low quality (or even moderate quality) instead of trusting our assessment resulting in very low quality. And then we ask: how does such a change of the GRADE assessment change the assessments of the intervention effects of methylphenidate for ADHD? We think very little.
1. Savovic J, Jones HE, Altman DG, Harris RJ, Juni P, Pildal J, Als-Nielsen B, Balk EM, Gluud C, Gluud LL, Ioannidis JP, Schulz KF, Beynon R, Welton NJ, Wood L, Moher D, Deeks JJ, Sterne JA. Influence of reported study design characteristics on intervention effect estimates from randomized, controlled trials. Ann Intern Med 2012, 157(6):429–438.
2. Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 1995, 273(5):408–412.
3. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of reporting of meta-analyses. Lancet 1999, 354(9193):1896–1900.
4. Kjaergaard L, Villumsen J, Gluud C. Reported methodologic quality and discrepancies between large and small randomized trials in meta-analysis. Ann Int Med 2001, 135(11):982–989.
5. Gluud LL, Thorlund K, Gluud C, Woods L, Harris R, Sterne JA. Correction: reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Ann Int Med 2008, 149(3):219.
6. Wood L, Egger M, Gluud LL, Schulz KF, Juni P, Altman DG, Gluud C, Martin RM, Wood AJ, Sterne JA. Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study. BMJ 2008, 336(7644):601–605.
7. Lundh A, Sismondo S, Lexchin J, Busuioc OA, Bero L. Industry sponsorship and research outcome. Cochrane Database Syst Rev 2012, 12:MR000033.
8. Higgins JPT, Green S: The Cochrane Handbook for Systematic Reviews of Interventions, Version 5.1.0. In The Cochrane Collaboration; 2011. Available from http://www.cochrane-handbook.org.

Chris Hollis writes:
Crucially, the approach taken by Storebo to GRADE was very different to Epstein et al.’s recent Cochrane review of methyphenidate in adults with ADHD [4]. To quote Epstein’s abstract: “Most included studies were judged to have unclear risk of bias for most categories. However, as all studies were randomized, double blind, and placebo-controlled and, in general, did not contain factors that significantly decreased the quality of the body of evidence, the quality of evidence was assessed as ‘high’ for most outcomes according to the GRADE (Grades of Recommendation, Assessment, Development, and Evaluation) approach. For one outcome-inattentiveness-most information came from studies at unclear risk of bias, and so the quality of evidence for this outcome was judged as ‘moderate’.”
Clearly, Storebo and Epstein applied different rules and interpretations to GRADE and the RoB tool. Unlike Storebo, Epstein’s Cochrane review didn’t combine the ‘uncertain’ and ‘high’ RoB categories, nor did they downgrade GRADE ratings based on only moderate heterogeneity.
Secondly, downgrading for study inconsistencies should only occur in GRADE where there is ‘serious’ evidence of heterogeneity. As Storebo noted, heterogeneity of 37% is only ‘moderate’ and hence shouldn’t have been downgraded by a point in GRADE. Again, there are inconsistencies with other recent Cochrane reviews. Punja et al. [3] downgraded by one level due to presence of significant statistical heterogeneity; I² >50%.
If Storebo et al. had followed the guidance in the Cochrane Handbook with respect to GRADE, and the approach of other recent Cochrane reviews of stimulants in ADHD, they should have only downgraded evidence by a maximum of one point to ‘moderate’ quality rather than ‘very low’ quality. Storebo’s review is an ‘outlier’ with respect to other similar systematic reviews of ADHD treatments (including the 2008 NICE Guideline CG72) that have used the GRADE assessment of quality of evidence.
In summary, the inconsistencies and idiosyncrasies in the use of GRADE and the RoB tool leave the reader with very little confidence in the results reported by Storebo with respect to the quality of evidence.
References:

Response:
One year ago, we submitted comments on the Epstein and coworkers’ Cochrane review, which we copy in full below:
“Submitted to The Cochrane Library
Storebø OJ, Gluud C. Criticism to “Immediate-release methylphenidate for attention deficit hyperactivity disorder (ADHD) in adults” [personal communication]. Email to: T Epstein via Wiley Online Feedback form 10 May 2015.
Date of Submission: 10-May-2015
Name: Ole Jakob Storebø and Christian Gluud Email Address: ojst@regionsjaelland.dk
Affiliation: Ole J. Storebø: MA, Ph. D., senior researcher, honorary associate professor, Psychiatric Research Unit, Region Zealand, Denmark; Christian Gluud: M.D., Dr. Med. Sci., Head of Department, Copenhagen Trial Unit, Centre for Clinical Intervention Research, Department 7812, Rigshospitalet, Copenhagen University Hospital, Denmark
Comment:
Dear Tamir Epstein, Nikolaos A Patsopoulos, and Mark Weise
We have carefully read your review Immediate-release methylphenidate for attention deficit hyperactivity disorder (ADHD) in adults published in Cochrane Library in issue 9, 2014. You conclude that “immediate-release methylphenidate is efficacious for treating adults with ADHD with symptoms of hyperactivity, impulsivity, and inattentiveness, and for improving their overall clinical condition. Trial data suggest that adverse effects from immediate-release methylphenidate for adults with ADHD are not of serious clinical significance, although this conclusion may be limited, certainly in the case of weight loss, by the short duration of published studies”. Furthermore, you state that “overall, the body of evidence about use of methylphenidate in adults with ADHD is of high quality. It shows that methylphenidate improves ADHD symptoms in adults and suggests that side effects are not serious”.

We do not agree with the assessment of the quality of included studies and we do not agree with your conclusions. We judge all the trials in this particular review to be at high risk of bias; the meta-analyses show high inconsistency between studies, and the estimates are highly imprecise. Your review showed the SMD for the outcome of hyperactivity was -0.60 (95% CI -1.11 to -0.09, 6 studies, number of participants (n = 245, high-quality evidence). All the studies in this meta-analysis was high risk of bias studies (most of the studies had three or more unclear risk of bias domains), the heterogeneity was I2=80% and the 95% confidence interval was wide from -1.11 to -0.09. In our opinion this outcome should have been downgraded to very low quality in the GRADE assessment.
The SMD for impulsivity was -0.62 (95% CI -1.08 to -0.17, 5 studies, (n = 207, high-quality evidence). All the studies in this meta-analysis was high risk of bias studies (most of the studies had three or more unclear risk of bias domains), the heterogeneity was I2=81% and the 95% confidence interval was wide. In our opinion this outcome should also have been downgraded to very low quality in the GRADE.
You state that there are some factors also that may increase the quality level such as high magnitude of effect. We are not sure which ‘high magnitude effect’ you are referring to?

We therefore strongly believe that your conclusion is incorrect. Based on our assessments of your evidence, our conclusion is that it is not possible at present to either recommend or refute methylphenidate for adults with ADHD.

Ole Jakob Storebø and Christian Gluud.”

We have no good explanation why this comment has not yet been posted on The Cochrane Library.
We still think that Hollis is wrong when he takes the review by Epstein and colleagues as evidence that methylphenidate has major beneficial effects for ADHD.

I would like to thank Ole Storebø and Christian Gluud for their thoughtful response to my comments. However, there remain some important issues which I would like to invite Storebø and Gluud to respond to:

1. Storebo & Gluud write: We have conducted the requested subgroup analysis comparing those studies with high risk of vested interest bias to those with low risk of vested interest bias on the outcome ADHD symptoms-teacher rated. The effect size of methylphenidate for the 14 trials with high risk of vested interest bias was SMD -0.86 [95% confidence interval (CI) -0.99 to -0.72] compared to SMD -0.50 [95% CI -0.69 to -0.31] in the 5 trials with low risk of vested interest bias. Test for subgroup differences is Chi² = 8.67, df = 1, P = 0.003. So even in this small sample we see a significant difference

RESPONSEThe effect of including studies with non-placebo or active controls

I would like to thank Ole Storebø and Christian Gluud for conducting the suggested sub-group analysis of the 19 trials selected for the primary outcome. The study selection criteria specified randomised clinical trials comparing all types of methylphenidate with placebo or no intervention. However, 3 of the 5 trials with low risk of vested interest bias did not meet the study selection criteria: BROWN 1985, FIRESTONE, JENSEN 1999 (MTA) compared methylphenidate to an active control condition; cognitive training, parent training and community treatment as usual (TAU) respectively. The impact of an active control will typically reduce the effect size of an intervention when compared to trials using placebo or inactive control. Hence, I invite Storebø and Gluud to remove the 3 studies which do not meet their study selection criteria from the meta-analysis and to re-run both the combined and sub-group analysis.

2. Storebø & Gluud write: We also downgraded GRADE by one point for inconsistency giving the GRADE ‘very low quality evidence’. The I2 was 37% and this may represent moderate heterogeneity as Hollis points out.

RESPONSEGRADE should not have been downgraded for inconsistency

I’m unclear from Storebø & Gluud’s latest response whether or not they accept that GRADE should not have been downgraded with an I2 of 37%. Given that there is minimal evidence of inconsistency if I2 is <50% – do Storebo & Gluud now agree that this does not constitute ‘serious inconsistency’ and that GRADE should not have been downgraded?

3. Storebø & Gluud write: We reiterate that the results of our review would have been the same even if we had disregarded the issue of vested interests.

RESPONSEThe impact of including vested interest bias in the RoB tool

I remain puzzled how including vested interests bias could not have affected the summary risk of bias assessment across studies and the GRADE rating. For the 19 studies selected for the primary outcome meta-analysis, in 9/19 (47%) vested interests was the only RoB domain rated at high risk of bias. If vested interested were excluded, only 7/19 (37%) of studies had high risk of bias in one or more domain. In GRADE, downgrading for serious risk of bias should only be considered if there is important risk of bias across the majority of studies. Given that the most common reason across all studies for rating high risk of bias was vested interests bias, it’s difficult to understand how the results (for GRADE) could have been the same had the authors disregarded the issue of vested interests.

4. Storebø & Gluud write: Furthermore, methylphenidate is associated with a number of common adverse events which may have led to loss of blinding in placebo controlled trials, making all trials at high risk of bias.…… If blinding is broken in just 20% or 30% of patients on methylphenidate, then this bias alone can explain the small but significant findings.

RESPONSEHow were teacher ratings unblinded by adverse effects?

The argument that teacher outcome ratings were unblinded as a result of common methylphenidate adverse effects such as loss of appetite or sleep loss is implausible and lacks supporting evidence. While I agree that self-report ratings and parent ratings are potentially susceptible to awareness of drug adverse effects – this argument is far less likely to hold for teachers who routinely would be unaware of these symptoms. Hence, the assumption that loss of blinding of outcome assessors (teachers) occurred is not supported by any evidence. Storebø & Gluud judgement that this makes all trials at high risk of bias – is also without basis.

5. Storebø & Gluud write: Finally, GRADE is providing the framework on which to assess the quality of the overall evidence. GRADE does not influence the results per se. So, Hollis is welcome to call all the evidence low quality (or even moderate quality) instead of trusting our assessment resulting in very low quality. And then we ask: how does such a change of the GRADE assessment change the assessments of the intervention effects of methylphenidate for ADHD? We think very little.

RESPONSEGRADE affects the interpretation of the results and confidence in the effect estimate

This is an important point, and again, I’m puzzled by Storebø & Gluud’s response. If we accept their original ‘very low quality’ GRADE rating, this means that we should have very little confidence in the moderate to large effect estimate reported for methylphenidate – and that the ‘true’ estimate is likely to be substantially different from this estimate of effect. In these circumstances, the reader may well decide to disregard the effect estimate and not use it to guide clinical decision making. In contrast, if the GRADE rating were ‘moderate quality’ – then then the conclusions drawn would be very different as the true effect is likely to be close to the estimate of effect. Although there is still a possibility that the true effect may be substantially different, this level of confidence is typically sufficient to support clinical practice and guideline recommendations. This is indeed a substantial change.

I would like to thank Ole Storebø and Christian Gluud for their thoughtful response to my comments. However, there remain some important issues which I would like to invite Storebø and Gluud to respond to:

1.

Storebo & Gluud write: “We have conducted the requested subgroup analysis comparing those studies with high risk of vested interest bias to those with low risk of vested interest bias on the outcome ADHD symptoms-teacher rated. The effect size of methylphenidate for the 14 trials with high risk of vested interest bias was SMD -0.86 [95% confidence interval (CI) -0.99 to -0.72] compared to SMD -0.50 [95% CI -0.69 to -0.31] in the 5 trials with low risk of vested interest bias. Test for subgroup differences is Chi² = 8.67, df = 1, P = 0.003. So even in this small sample we see a significant difference.”

Hollis’ response:

The effect of including studies with non-placebo or active controls
I would like to thank Ole Storebø and Christian Gluud for conducting the suggested sub-group analysis of the 19 trials selected for the primary outcome. The study selection criteria specified randomised clinical trials comparing all types of methylphenidate with placebo or no intervention. However, 3 of the 5 trials with low risk of vested interest bias did not meet the study selection criteria: BROWN 1985, FIRESTONE, JENSEN 1999 (MTA) compared methylphenidate to an active control condition; cognitive training, parent training and community treatment as usual (TAU) respectively. The impact of an active control will typically reduce the effect size of an intervention when compared to trials using placebo or inactive control. Hence, I invite Storebø and Gluud to remove the 3 studies which do not meet their study selection criteria from the meta-analysis and to re-run both the combined and sub-group analysis.

RESPONSE:

We can assure Hollis that we have not included any trials that did not meet our inclusion criteria. It is clearly written in our review that we included randomised trials if both intervention groups (experimental and control) received the cointervention(s) similarly.

The Brown 1985 trial was a twelve-week randomised, parallel group trial with 4 intervention groups: 1) methylphenidate combined with cognitive training; 2) cognitive training; 3) methylphendiate 4) no treatment (not randomly assigned). The cognitive training programme was: We used the methylphenidate plus cognitive training group as the experimental group and compared it with the cognitive training group. The Firestone 1981 trial was a three-month randomised, double-blind, placebo-controlled, parallel group trial, wherein participants were randomly assigned to 3 intervention groups: 1) methylphenidate; 2) methylphenidate plus parent training; and 3) placebo plus parent training. We used the methylphenidate plus parent training group versus the placebo plus parent training group in our comparison. The Jensen 1999 trial was a 14-month multi-centre randomised, parallel group trial with 4 intervention groups: 1) medication management; 2) behavioural treatment; 3) medication management plus behavioural treatment; 4) community care (control group). We used the medication management plus behavioural treatment as experimental versus behavioural treatment as control in our comparison. In the medication management most of the patient received methylphenidate (77%) and 10% received dextroamphetamine. 12.5% received no medication due to high placebo response. In our review, we show that the in- or exclusion of the latter trial does not affect our results (Storebø 2015).

We have now conducted the wanted subgroup analysis removing the three trials and we get this result: The effect size of methylphenidate for the 14 trials with high risk of vested interest bias was SMD −0.86 [95% confidence interval (CI) −0.99 to −0.72] compared to SMD −0.60 [−0.95 to −0.26] in the 2 trials with low risk of vested interest bias. Still there is lower effect in the low risk of vested interest bias group now including only two trials, but the significant difference between the two subgroups disappears. This is, however, an erratic analysis as the three above described trials must be included in our analysis according to our Cochrane protocol. Moreover, due to the paucity of data in the second subgroup (only two trials) such an analysis does not have sufficient power.

Storebø & Gluud write: “We also downgraded GRADE by one point for inconsistency giving the GRADE ‘very low quality evidence’. The I2 was 37% and this may represent moderate heterogeneity as Hollis points out.”

Hollis’ response:

GRADE should not have been downgraded for inconsistency
I’m unclear from Storebø & Gluud’s latest response whether or not they accept that GRADE should not have been downgraded with an I2 of 37%. Given that there is minimal evidence of inconsistency if I2 is <50% – do Storebo & Gluud now agree that this does not constitute ‘serious inconsistency’ and that GRADE should not have been downgraded?

RESPONSE:

We have written that we consider the I2 of 37% to a moderate inconsistency and we still think it is fully correct to downgrade the evidence by this heterogeneity.

3.

Storebø & Gluud write: “We reiterate that the results of our review would have been the same even if we had disregarded the issue of vested interests.”

Hollis’ response:

The impact of including vested interest bias in the RoB tool
I remain puzzled how including vested interests bias could not have affected the summary risk of bias assessment across studies and the GRADE rating. For the 19 studies selected for the primary outcome meta-analysis, in 9/19 (47%) vested interests was the only RoB domain rated at high risk of bias. If vested interested were excluded, only 7/19 (37%) of studies had high risk of bias in one or more domain. In GRADE, downgrading for serious risk of bias should only be considered if there is important risk of bias across the majority of studies. Given that the most common reason across all studies for rating high risk of bias was vested interests bias, it’s difficult to understand how the results (for GRADE) could have been the same had the authors disregarded the issue of vested interests.

RESPONSE:

We have clearly written in our protocol that we would consider a trial with one or more unclear risk of bias domains as trials with high risk of bias. Furthermore, there is the deblinding issue which we now have described several times. We consider the blinding to been broken in all the trials, as methylphenidate is associated with a number of common adverse events which may have led to loss of blinding for just a subset of participants, making all trials at high risk of bias. We do believe that adverse events such as “decreased appetite” and “sleeping problems” can usually be detected by the teachers. It is likely that there have been a several more non-serious adverse events in many of the included trials as many patients typically experienced more than one adverse event. Furthermore, for methodological reasons, we only used dichotomous outcomes reflecting the number of participants affected by the event per the total number of participants.

4.

Storebø & Gluud write: “Furthermore, methylphenidate is associated with a number of common adverse events which may have led to loss of blinding in placebo controlled trials, making all trials at high risk of bias.…… If blinding is broken in just 20% or 30% of patients on methylphenidate, then this bias alone can explain the small but significant findings.”

Hollis’ response:

How were teacher ratings unblinded by adverse effects?
The argument that teacher outcome ratings were unblinded as a result of common methylphenidate adverse effects such as loss of appetite or sleep loss is implausible and lacks supporting evidence. While I agree that self-report ratings and parent ratings are potentially susceptible to awareness of drug adverse effects – this argument is far less likely to hold for teachers who routinely would be unaware of these symptoms. Hence, the assumption that loss of blinding of outcome assessors (teachers) occurred is not supported by any evidence. Storebø & Gluud judgement that this makes all trials at high risk of bias – is also without basis.

RESPONSE

We considered the teacher-rated analyses as the primary analysis because the symptoms of ADHD are more easily detectable in the school setting, and due to the following reasons:

1) The importance of teacher report and involvement in the process of diagnosis and treatment of ADHD is well established (Scott 2012).
2) Having two reports (parents and teachers) may not be necessary to access the efficacy of ADHD medications, as teacher reports are better predictors of parent reports, but parent reports are not better than teacher report of ADHD symptomatology (Faraone 2005).
3) In the Multimodal ADHD Treatment (MTA) trial, information provided by parents was not always thought to be strong (Efstratopoulou 2013).

We do believe that the symptoms of adverse events also are easily detectable in the school setting. Why should teachers be unaware of the adverse events such as decreased appetite and sleep problems in the school setting? The children do eat in school and the teachers can detect tiredness. In the MTA titration trial it is reported that the adverse event — appetite suppression — related to increasing dose was indeed detected by teachers (Greenhill et al. 2001).

We are happy to read that you can agree with us that self-report ratings and parent ratings are potentially susceptible to awareness of drug adverse effects and thereby introducing a deblinding issue. We do believe that this is also likely the case in the school setting. As stated, the MTA titration trial reports exactly this (Greenhill et al. 2001).

Moreover, and importantly, we observed about similar assessments of intervention effects irrespective of who was observer (teacher, other observers, parents). This can either represent true (small) intervention effects or about similar bias mechanisms. Only properly nocebo-controlled trials will be able to examine which mechanism is operative, as stated in our review (Storebø 2015).

Storebø & Gluud write: “Finally, GRADE is providing the framework on which to assess the quality of the overall evidence. GRADE does not influence the results per se. So, Hollis is welcome to call all the evidence low quality (or even moderate quality) instead of trusting our assessment resulting in very low quality. And then we ask: how does such a change of the GRADE assessment change the assessments of the intervention effects of methylphenidate for ADHD? We think very little.”

Hollis’ response:

GRADE affects the interpretation of the results and confidence in the effect estimate
This is an important point, and again, I’m puzzled by Storebø & Gluud’s response. If we accept their original ‘very low quality’ GRADE rating, this means that we should have very little confidence in the moderate to large effect estimate reported for methylphenidate – and that the ‘true’ estimate is likely to be substantially different from this estimate of effect. In these circumstances, the reader may well decide to disregard the effect estimate and not use it to guide clinical decision making. In contrast, if the GRADE rating were ‘moderate quality’ – then the conclusions drawn would be very different as the true effect is likely to be close to the estimate of effect. Although there is still a possibility that the true effect may be substantially different, this level of confidence is typically sufficient to support clinical practice and guideline recommendations. This is indeed a substantial change.

RESPONSE:

We still believe in our assessment of the quality of the evidence: very low quality. We just want Hollis to reflect on this: how do you know that there is a large effect estimate? We do not believe that the standardised mean difference (SMD) of −0.77 can be considered a large effect. It is necessary to transform the SMD to a mean difference on a well-known rating scale and compare this mean difference to a minimal relevant difference (MIREDIF). We have done this for the outcome teacher-rated ADHD symptoms outcome and the SMD of −0.77 (95% confidence interval −0.90 to −0.64) was transformed into a mean value of −9.6 on the ADHD rating scale. This is larger than the minimal clinical relevant difference of −6.6 points we selected, but this is only three points above the MIREDIF value (Storebø 2015). If Hollis want to call the evidence for low quality or even moderate quality this will mean that the true effect size can still be even smaller than the −9.6 and then closer to or below the MIREDIF value. One would in those (erratic) situations also report that there seems to be some effect of methylphenidate but one cannot be sure of the size of this effect. We need trials with low risk of bias, and we could not identify such trials. We also reported in our review: The results suggest that among children and adolescents with a diagnosis of ADHD, methylphenidate may improve teacher-reported symptoms of ADHD and general behaviour and parent reported quality of life. However, the magnitude of this effect is uncertain. We do not think that there is a large difference between our assessment of the data and that of Hollis. Even if there should be small benefits of using methylphenidate, are these benefits able to balance the risks of using methylphenidate? These risks encompass (a) adverse events and serious adverse events (we are presently reviewing about 320 observational studies reporting adverse events and serious adverse events in connection with methylphenidate); (b) the lack of developing and accessing other interventions that may benefit patients due to the belief that “we are having a treatment that benefits”, the so-called ‘sleeping pillow effect’; and (c) the costs of the medication.

Thanks to Chris Hollis for this thoughtful post, to Andre Tomlin for flagging it up to the Cochrane community, and to Ole Jakob Storebø for responding comprehensively on behalf of the author team.

On behalf of Cochrane, I would like to add the recommendation that those interested in commenting on the conduct or findings of individual Cochrane Reviews use our Feedback system. A link to this is provided on every published Cochrane Review, in the Article Tools menu (http://onlinelibrary.wiley.com/doi/10.1002/14651858.CD009885.pub2/abstract), and comments submitted through this system will then undergo review by the Review Group’s Feedback Editor for further action, including potentially updating the review in question.

Dear Nancy,
Thank you for this very kind invitation. But does Cochrane still accept comments? In our response above by Ole Jakob Storebø and myself posted today on the 5th of April you can see why I pose such a provocative question to you. Ole Jakob and I have tried to get our comments into The Cochrane Library for almost a year now. can you please explain why?

I notice that two of the authors on the Storebø review have links with pharma, one with Novartis and the other with the Lundbeck foundation, who own 70% of the shares in Lundbeck. Does this mean that we have to down grade the quality of the review and consider it as unreliable?

[…] However, two recent Cochrane reviews (Storebo et al, 2015; Punja et al, 2016a) questioned the quality of the evidence from available RCTs (see also a previous Mental Elf blog by Chris Hollis on this review). […]

Methylphenidate for ADHD: have Chris Hollis also got it wrong this time?

In the above arguments against our Cochrane systematic review on methylphenidate for children and adolescents with attention deficit hyperactivity disorder (ADHD) (1), Chris Hollis refers to a Cochrane review by Epstein, Patsopoulos, and Weiser on immediate-release methylphenidate (Ritalin®) for adults with ADHD. Hollis underlines that these authors managed the evaluation of the trials they included in their meta-analysis differently from us. We think that Chris Hollis as well as the readers of this blog should be informed that the review by Epstein, Patsopoulos, and Weiser, published in 2014, was withdrawn from The Cochrane Library on 26 May 2016 (2). The decision to withdraw the review was taken after prolonged criticism raised against serious flaws in the included trials; serious defects in assessments, analyses, and drawn conclusions; lack of authors’ responses to raised critique; and authors’ conflicts of interests (2). In order for people to see the withdrawn review as well as the severe criticism raised against that review by us and three other groups we also refer to our homepage where you may find a copy from just before withdrawal (3).

Chris Hollis have in his criticism of our systematic review highlighted the Epstein, Patsopoulos, and Weiser review as an example on correct GRADE assessment. So now we ask, did Chris Hollis also got it wrong this time?