Pitfalls in the Use of Statistical Methods in Systematic Reviews of Therapeutic Interventions: A Cross-sectional Study

Objective Researchers have identified several problems in the application of statistical methods in published systematic reviews (SRs). However, these evaluations have been narrow in scope, focusing only on one particular method (such as sensitivity analyses) or restricting inclusion to Cochrane SRs, which make up only 15% of all SRs of biomedical research. We aimed to investigate the application and interpretation of various statistical methods in a cross-section of SRs of therapeutic interventions, without restriction by journal, clinical condition, or specialty.

Design We selected articles from a database of SRs we assembled previously. These articles consisted of a random sample of 300 SRs addressing various questions (therapeutic, diagnostic, or etiologic) that were indexed in MEDLINE in February 2014. In the current study, we included only those SRs that focused on a therapeutic question, reported at least 1 meta-analysis, and were written in English. We collected data on 61 prespecified items that characterized how well random-effects meta-analysis models, subgroup analyses, sensitivity analyses, and funnel plots were applied and interpreted. Data were extracted from articles and online appendices by a single reviewer, with a 20% random sample extracted in duplicate.

Results Among 110 SRs, 78 (71%) were non-Cochrane SRs and 55 (50%) investigated a pharmacological intervention. The SRs presented a median of 13 (interquartile range, 5-27) meta-analyses. Among the 110 primary meta-analyses in each SR, 62 (56%) used the random-effects model but only 5 of 62 (8%) interpreted the pooled result correctly (that is, as the average of the intervention effects across all studies). Subgroup analyses were reported in 42 of 110 SRs (38%), but findings were not interpreted with respect to a test for interaction in 29 of 42 cases (69%), and the issue of potential confounding in the subgroup analyses was not raised in any SR. Sensitivity analyses were reported in 51 of 110 SRs (46%), without any rationale in 37 of 51 cases (73%). Authors of 37 of 110 SRs (34%) reported that visual inspection of a funnel plot led them to not suspect publication bias. However, in 28 of 37 cases (76%), fewer than 10 studies of varying size were included in the plot.

Conclusions There is scope for improvement in the application and interpretation of statistical analyses in SRs of therapeutic interventions. Guidelines such as PRISMA may need to be extended to provide more specific statistical guidance.