Statistical Mistakeshttps://statisticalmistakes.com
Misuse of statistical inference in medical researchThu, 24 May 2018 17:44:09 +0000enhourly1http://wordpress.com/https://statisticalmistakesblog.files.wordpress.com/2017/03/cropped-sm.png?w=32Statistical Mistakeshttps://statisticalmistakes.com
3232Misinterpretation of treatment effectshttps://statisticalmistakes.com/2018/01/04/misinterpretation-of-treatment-effects/
https://statisticalmistakes.com/2018/01/04/misinterpretation-of-treatment-effects/#respondThu, 04 Jan 2018 14:44:40 +0000http://statisticalmistakes.com/?p=1547The odds ratio is an often used outcome measure in randomised trials with binary endpoints. The odds ratio overestimates, however, the relative risk. The degree of the bias depends on the baseline risk or incidence of the outcome. In some cases, the overestimation can be substantial.

Knol et al. (1) reviewed 288 randomised trials published in 2008 in Annals of Internal Medicine, British Medical Journal, Journal of the American Medical Association, Lancet, and New England Journal of Medicine.

Of 193 trials with a binary primary endpoint, a majority presented the outcomes as hazard ratios as the focus was on time-to-event, and 24 trials presented the outcomes in terms of odds ratios. In these, 5 of the trials had odds ratios that differed more than 100% from their corresponding relative risk. Of the 41 trials presenting binary secondary endpoints, 19 presented at least one odds ratio that differed more than 100% from its corresponding relative risks. None of the trials that presented outcomes in terms of odds ratios warned the reader about the risk of misinterpreting the results.

The authors conclude that the misinterpretation of odds ratios can seriously affect treatment decisions and policy making.

]]>https://statisticalmistakes.com/2018/01/04/misinterpretation-of-treatment-effects/feed/0ranstamPhlebotomy issueshttps://statisticalmistakes.com/2017/04/13/phlebotomy-issues/
https://statisticalmistakes.com/2017/04/13/phlebotomy-issues/#respondThu, 13 Apr 2017 14:35:15 +0000http://statisticalmistakes.com/?p=1404Ialongo and Bernardini (1) reviewed 36 investigations of pre-analytical factors related to phlebotomy issued between 1996 and 2016 (April). The largest part of studies had a cohort of healthy volunteers or outpatients, the former group with a smaller sample size. Sample size calculations for detecting a specified effect was addressed only in one manuscript. Student’s t-test and Wilcoxon’s test were often used, but many papers assessed neither bias (12) nor agreement (24).

]]>https://statisticalmistakes.com/2017/04/13/phlebotomy-issues/feed/0ranstamMeta-analyses of non-randomized studieshttps://statisticalmistakes.com/2016/12/13/meta-analyses-of-non-randomized-studies/
https://statisticalmistakes.com/2016/12/13/meta-analyses-of-non-randomized-studies/#respondTue, 13 Dec 2016 13:40:06 +0000http://www.statisticalmistakes.com/wordpress/?p=1290Many systematic reviews include data from non-randomized studies. Are the methods used in the meta-analyses appropriate?

Faber et al. (1) searched MEDLINE for meta-analyses that had been published during 2013 and that included non-randomized studies. Two reviewers then assessed the characteristics and key methodological components in these publications.

Of the initially selected 188 papers, 119 included both randomized and non-randomized intervention studies, and 69 only non-randomized intervention studies. Assessments of bias was reported in 135 papers (72%), but this evaluation referred to confounding bias only in 33 papers (18%). In 130 papers (69%) the design of the non-randomized intervention study was not clearly specified, and it was unclear in 131 papers (70%) if crude or adjusted estimates were used.

The authors conclude that some important methodological aspects of the systematic review process are not adequately reported in meta-analyses that include non-randomized intervention studies.Reference

]]>https://statisticalmistakes.com/2016/12/13/meta-analyses-of-non-randomized-studies/feed/0ranstamMatched case-control studieshttps://statisticalmistakes.com/2016/12/04/matched-case-control-studies/
https://statisticalmistakes.com/2016/12/04/matched-case-control-studies/#respondSun, 04 Dec 2016 20:02:05 +0000http://www.statisticalmistakes.com/wordpress/?p=1281Case-control studies are a commonly used to study rare diseases with long latency periods, and matching cases and controls individually is an often used method to control for the effects of potential confounding variables. The analysis of such matched data, however, requires special statistical methods.

Niven et al. (1) wished to investigate how many of published, peer-reviewed matched case-control studies that were analysed using appropriate statistical methodology. They identified and reviewed 37 matched case-control studies. Of these, 16 (43%), were adequately analysed. Studies with adequate analysis had more often than other studies cases with cancer and heart disease, 10/16 (63%) versus 5/21 (24%) and more often multiple controls , 14/16 (88%) versus 13/21 (62%). They were also more often published in a high impact journal.

The authors conclude that it their study raises concern that a majority of matched case-control studies present findings that are based on inadequate statistical analyses.

]]>https://statisticalmistakes.com/2016/12/04/matched-case-control-studies/feed/0ranstamPrediction models for cardiovascular disease riskhttps://statisticalmistakes.com/2016/12/04/prediction-models-for-cardiovascular-disease-risk/
https://statisticalmistakes.com/2016/12/04/prediction-models-for-cardiovascular-disease-risk/#respondSun, 04 Dec 2016 13:39:09 +0000http://www.statisticalmistakes.com/wordpress/?p=1276Damen et al. (1) undertook a systematic review in order to provide an overview of prediction models for risk of cardiovascular disease in the general population. They searched Medline and Embase until June 2013 and found 9 965 articles, of which 212 were included in their review. These articles described 363 prediction models and 473 external validations.

Most of these models were developed in Europe (n=167, 46%) and predicted risk of coronary heart disease (n=118, 33%) over a 10 year period (n=209, 58%). Common predictors were smoking (n=325, 90%) and age (n=321, 88%). Most of the models were sex specific (n=250, 69%).

The authors found substantial heterogeneity in predictor and outcome definitions, and important information was often missing. For 49 models (13%) the prediction horizon was not specified, and for 92 (25%) crucial information for enabling the model to be used for individual risk prediction was missing.

No more than 132 models (36%) were externally validated and only 70 (19%) by independent investigators. The model performance was heterogeneous and discrimination and calibration were only reported for 65% and 58% of the external validations, respectively.

The authors conclude that there is an excess of models predicting cardiovascular disease in the general population, and that the usefulness of most of these is unclear because of methodological shortcomings, incomplete presentation, lack of external validation, and lack of model impact studies. Future work should primarily focus on external validation and comparisons of already existing models.

]]>https://statisticalmistakes.com/2016/12/04/prediction-models-for-cardiovascular-disease-risk/feed/0ranstamReporting errors in psychologyhttps://statisticalmistakes.com/2016/08/26/reporting-errors-in-psychology/
https://statisticalmistakes.com/2016/08/26/reporting-errors-in-psychology/#respondFri, 26 Aug 2016 11:50:48 +0000http://www.statisticalmistakes.com/wordpress/?p=1245Nuijten et al (1) developed a computer program, statcheck (2), that finds and checks p-values reported in APA style in published papers. Over 250 000 p-values presented in 30 717 psychological papers published in the Journal of Applied Psychology, Journal of Consulting and Clinical Psychology, Developmental Psychology, Journal of Experimental Psychology, Journal of Personality and Social Psychology, Psychological Science, Frontiers in Psychology, and the Public Library of Science.

Of the reviewed papers 16,700 included tests of statistical significance in a format that it could check. Half of these reported at least one erroneous p-value. One in eight contained at least one erroneous p-value, which may have affected the conclusion of the paper. The prevalence of erroneous p-values was stable over time or declined, but was higher in p-values reported as significant than in p-values reported as non-significant.

The authors suggest that data sharing, letting co-authors check results, and checking manuscripts using statcheck could reduce the number of such reporting errors.

]]>https://statisticalmistakes.com/2016/08/26/reporting-errors-in-psychology/feed/0ranstamStratified randomisation in clinical trialshttps://statisticalmistakes.com/2016/08/13/stratified-randomisation-in-clinical-trials/
https://statisticalmistakes.com/2016/08/13/stratified-randomisation-in-clinical-trials/#respondSat, 13 Aug 2016 12:41:19 +0000http://www.statisticalmistakes.com/wordpress/?p=1241The design of randomised trials often include stratified randomisation or minimisation to achieve balance with regard to important prognostic factors. One consequence of such balancing is that the treatment groups become correlated, which is at variance with the underlying assumption of statistically independent observations. In order to ensure correct p-values and confidence intervals it is important to adjust for the balancing factors in the statistical analysis. Otherwise, p-values will be too large and confidence intervals too wide.

Kahan and Morris (1) performed a review of the British Medical Journal, Journal of the American Medical Association, Lancet, and the New England Journal of Medicine with respect to randomised trials published in 2010. The purpose was to see if the method of randomisation was adequately reported, how often balancing was used and whether the balancing factors were adjusted for in the statistical analysis.

The results were that balanced randomisation was common. While the randomisation method was unclear in 37% of the 258 published reports, 63% included balancing on at least one factor. A majority of the trials with balanced randomisation were inadequately analysed. In only 26% of them included the statistical analysis adjustment for all balancing factors. The trials in which the statistical analysis did not include adjustment for balancing factors were less likely to show a statistically significant result, 57% versus 78%.

Kahan and Morris conclude that balancing is common but often poorly understood.

]]>https://statisticalmistakes.com/2016/08/13/stratified-randomisation-in-clinical-trials/feed/0ranstamOne-tailed testinghttps://statisticalmistakes.com/2016/08/02/one-tailed-testing/
https://statisticalmistakes.com/2016/08/02/one-tailed-testing/#respondTue, 02 Aug 2016 07:51:31 +0000http://www.statisticalmistakes.com/wordpress/?p=1238The use of one-tailed tests is controversial as a one-tailed p-value is half the two-tailed, which tempts some researchers to chose using one-tailed tests. This would be inappropriate if the choice is made after having performed the test and when the direction of the test is not specified in advance. To avoid the power problem related to one-tailed testing regulatory guidelines state that “The approach of setting type I errors for one-sided tests at half the conventional type I error used in two-sided tests is preferable in regulatory settings”.

Lombardi and Hurlbert (1) reviewed the frequency of one-tailed testing in the 1989 and 2005 volumes of Animal Behaviour and Oecologia. They found one-tailed testing in 24% of the relevant articles in Animal Behaviour and in 13% of Oecologia articles. One-tailed testing were used more often with non-parametric hypotheses than with parametric and twice as often in 1989 as in 2005.

The authors refer to the criterion that one-tailed tests should only be used when a societal (not individual) interest results in a null hypothesis having just one direction, and they claim that according to this criterion all the uses of one-tailed tests in the two reviewed journals were invalid.

The conclusion of the investigation is that “One-tailed tests rarely should be used for basic or applied research in ecology, animal behaviour or any other science.”

]]>https://statisticalmistakes.com/2016/08/02/one-tailed-testing/feed/0ranstamBias in logistic regressionhttps://statisticalmistakes.com/2016/06/23/bias-in-logistic-regression/
https://statisticalmistakes.com/2016/06/23/bias-in-logistic-regression/#respondThu, 23 Jun 2016 12:44:13 +0000http://www.statisticalmistakes.com/wordpress/?p=1234Logistic regression is a statistical method that is widely used in cross-sectional and cohort studies to identify and estimate the effects of risk factors. Imperfect diagnostic tests may, however, bias the outcome. A negative test can be interpreted as lack of infection and a positive test can be as infection presence. As long as the covariates do not influence the sensitivity and specificity of the diagnostic tests, the bias is towards null, but if the sensitivity and specificity are influenced by covariates, the direction of the bias cannot be easily predicted.

Valle et al. performed a systematic review. PubMed was searched using different combinations of the search terms ‘malaria’, ‘logistic’, ‘models’, ‘regression’, ‘diagnosis’, and ‘diagnostic’. Of 36 studies that satisfied the criteria, 70 % did not address the issue of the imperfect detection in malaria outcome.

The authors interprets their results as suggesting that malaria epidemiologists are generally unaware of the consequences that imperfect detection can have on parameter estimates from logistic regression. The authors also recommends using Bayesian models instead of logistic regression.

]]>https://statisticalmistakes.com/2016/06/23/bias-in-logistic-regression/feed/0ranstamPost hoc testshttps://statisticalmistakes.com/2016/05/22/post-hoc-tests/
https://statisticalmistakes.com/2016/05/22/post-hoc-tests/#respondSun, 22 May 2016 09:07:28 +0000http://www.statisticalmistakes.com/wordpress/?p=1229Statistical testing of more than two groups are common in behavioural science, but a clear methodological motivation for the chosen approach is usually not provided. Ruxton and Beauchamp (1) surveyed 12 issues of Behavioral Ecology (volume 18) and Animal Behaviour (volume 73) with regard to such multiple comparisons and found 70 papers where more than 2 groups were tested.

In 68 of the 70 papers the statistical analysis involved homogeneity tests across all groups, using an ANOVA or similar distribution-free test. Comparison between specific groups were not performed when the null hypothesis of homogeneity across all groups could not be rejected, but followed invariably when this hypothesis was rejected. The comparisons were then done using several different tests. The authors stress the importance of distinguishing between planned and unplanned comparisons, and that it is important for researchers also to consider what constitutes a biologically interesting effect. They recommend avoiding test procedures that include all pairwise comparisons when only a small subset of them are of interest.

The authors conclude from their survey that statistical testing of comparisons among more than two groups remain common in behavioral science, but that the common practice is variable and almost always suboptimal.