Biostatistics for Busy Doctors: Letting go of the beloved “p value”

(I initially published this article on a Sermo weekly column titled “Biostatistics weekly” under the moniker Sciencebased. Following is a revision of that article)

The medical literature uses p-values to determine which therapies work and which don’t. The p-value represents the probability that the observation seen in the study occurred by chance and not due to the treatment. The scientific community established a probability of less than 5% as the cutoff in between chance and real effects. Based on this arbitrary value, a drug trial showing a p-value of 0.06 is considered negative but another showing a p-value of 0.04 is considered positive.

The p-value by itself does not provide enough information to evaluate results as I will show in this example. A weight loss drug called Slimnow is compared in a double-blinded study against placebo and is shown to be effective at weight loss with a p-value of 0.000001. A different drug called Fatnomore, is also compared to placebo but shown not to be effective with a p-value of 0.07. At first glance, one would be inclined to recommend Slimnow to their patients but more information is needed to make an informed decision.

Slimnow was given to half of 1000 patients, and it showed a mean weight loss of 0.5 pounds (95% confidence interval of 0.4 to 0.6). Fatnomore was tried in half of 30 patients where the mean weight loss was 15 pounds (95% CI of -1 to 30 pounds).

Confidence intervals (CI) inform about statistical significance and establish the magnitude and precision of the effect. For Slimnow, the size of the effect is not clinically relevant at 0.5 pounds, although this is a precise estimate given the narrow confidence interval of 0.4 to 0.6. Because the lower limit of this CI is not negative, this means that the p-value is <0.05. On the other hand, Fatnomore offers a more clinically relevant weight loss of 15 pounds, but given the small sample size, the precision of the magnitude of the effect is not very good: -1 to 30 pounds and the lower limit of the CI is negative, which means the p-value is >0.5.

When expressing the confidence interval of odds ratios, if the confidence interval includes 1, the p-value is not significant (i.e. 0.8-2.6).

Top medical journals now require that the magnitude of the effect be expressed as odds ratio, relative risk, relative risk reduction and absolute risk reduction, all with 95% confidence intervals. Including a p-value is optional but not always necessary: one can determine if the results are significant by looking at the CI.

CI are particularly enlightening when reporting negative studies. Very frequently things are reported as non-significant but when one looks at the CI, it becomes apparent that the study was underpowered because of a very wide CI.

Conclusion

Confidence intervals provide the same information that the p-value, but also communicate the magnitude and precision of an effect. The reporting of a p-value by itself without a CI provides insufficient information to evaluate the results of a study.