Quantity and the Biological Context

The second half of a typical presentation of a mediocre study in biology goes something like this: a graph is presented showing the effect sizes and a lot of asterisks are added to denote statistical significance and the presenter talks about how some differences are “significant” whereas the others have a “tendency [towards significance]”. Then, in the result section, “significant” has morphed into clinical significance, where the differences are taken to be of real clinical value and all the difference that did not pass the significance test are either claimed to approaching clinical relevance or simply claimed to be equivalent.

The problem(s)

As pointed out in an earlier post on annoying statistical fallacies, statistical significance means that the probability of obtaining the data you got, or more extreme data, given that the null hypothesis is true is low. This is not the same as practical (clinical, medical, biological, sociological etc.) significance, which says that the differences are so large as to be of clinical value. Clearly, it is possible for it to be very unlikely to obtain a certain difference given the null hypothesis, but that this difference is so small as to be without clinical relevance.

Using asterisks may look clean, but they really obfuscate the result of the significance test. Because it is possible that you have two differences that are very close to the cutoff level, say p = 0.051 and the other with p = 0.048. In a graph with asterisks, the later would be shown as statistically significant, but the other would not, despite the fact that there is only a 0.003 difference in p value. In practice, there is not that much more evidence against the null in the later than the former.

The “there is a tendency [towards statistical significance]” for differences that have a p value not too far above 0.05 is in some sense intellectually dishonest, and is really a tactic to inflate the relevance of the results. This is easy to see because these same people would never call a result that had a p value just below 0.05 as “approaching non-significance”, but would tout it as “significant”. In other words, there is a bias towards wanting the results to appear statistically significant, as people only seem to care about adjusting results from the lower side of the 0.05 cutoff.

In a similar fashion, a statistically non-significant difference does not imply equivalence. This is primarily because of two reasons: (1) p-values depend on sample size, so a large sample size will give statistical significance for even negligible differences and a small sample size will make it so that large differences fail to yield statistical significance and (2) while p-values do tell you something about the degree of overlap between 95% confidence intervals, they do not tell you anything about the absolute size of the errors bars. This means that a treatment with an OR of, say 1.01 (1 is deemed equivalence) could have a [0.5,1.5] 95% CI, meaning the plausible values for the population parameter goes from clinically beneficial to clinically harmful. Clearly, you do not want to call two treatments equivalent if one of them has a real possibility of being clinically harmful.

The solutions(s)

What can we do about these problems? Here are a couple of recommendations.

1. If you have to use significance testing, clearly distinguish between statistical and practical significance and under no circumstance use only statistical significance as a basis for concluding practical significance.

2. When reporting p-values, report exact p-values instead of asterisks (*) or less than a arbitrary cutoff (p < 0.05).

3. Use 95% confidence intervals over significance testing as much as you can. Also, do not use the confidence intervals as just another way of doing a significance test. Think of them as the the range of plausible values for the population parameter.

4. Use the actual differences in effect size, together with the 95% confidence intervals, to interpret the difference in the biological context. Focus on quantitative interpretations! Is this difference minor and clinically negligible, moderate a clinically promising or large and clinically useful? Strictly speaking, sometimes minor differences are clinically useful etc. so prefer biological context over arbitrary cutoffs if you can.

References and further reading

Ellis, Paul D. (2010). The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results.

One response to “Quantity and the Biological Context”

Debunking Denialism

Modern life presents us with an apparent paradox: science has a strong cultural authority, yet primitive darkness is coming back in the shape of creationism, quack medicine, opposition to vaccination, HIV/AIDS denialism, anti-psychiatry and so on.

Debunking Denialism takes on the enemies of reason.

Article Library

If you want to read more content from Debunking Denialism, check out the article library, or the main content below.

"I realize that 'complementary and alternative medicine' (CAM) or, what quackademics like to call it now, 'integrative medicine' (IM) is meant to refer to 'integrating' alternative therapies into SBM or 'complementing' SBM with a touch of the ol’ woo, but I could never manage to understand how 'integrating' quackery with SBM would do anything but weaken the scientific foundation of medicine."

- David Gorski, cancer surgeon and debunker of pseudoscience (source).

"Postmodernism, the school of 'thought' that proclaimed 'There are no truths, only interpretations' has largely played itself out in absurdity, but it has left behind a generation of academics in the humanities disabled by their distrust of the very idea of truth and their disrespect for evidence, settling for 'conversations' in which nobody is wrong and nothing can be confirmed, only asserted with whatever style you can muster."

"If I am ignorant about a phenomenon, that is a fact about my state of mind, not a fact about the phenomenon; to worship a phenomenon because it seems so wonderfully mysterious, is to worship your own ignorance; a blank map does not correspond to a blank territory, it is just somewhere we haven’t visited yet"

"As an aside, it is ironic that CAM proponents often simultaneously tout how individualized their treatment approach is, but then claim that one product or treatment can cure all cancer. Meanwhile they criticize the alleged cookie-cutter approach of mainstream medicine, which is actually producing a more and more individualized (and evidence-based) approach to such things as cancer."

- Steven Novella, neurologist and founder of the New England Skeptical Society. (source).

"Twenty epidemiologic studies have shown that neither thimerosal nor MMR vaccine causes autism. These studies have been performed in several countries by many different investigators who have employed a multitude of epidemiologic and statistical methods. The large size of the studied populations has afforded a level of statistical power sufﬁcient to detect even rare associations. These studies, in concert with the biological implausibility that vaccines overwhelm a child’s immune system, have effectively dismissed the notion that vaccines cause autism. Further studies on the cause or causes of autism should focus on more-promising leads."

"To me, skepticism is not believing what someone tells you, investigating all the information before coming to a conclusion. Skepticism is a good thing. Global warming skepticism is not that. It’s the complete opposite of that. It’s coming to a preconceived conclusion and cherry-picking the information that backs up your opinion. Global warming skepticism isn’t skepticism at all."

- John Cook, Climate Communication Fellow for the Global Change Institute at the University of Queensland (source).

"Some people accused me of being pro-Muslim in Bosnia but I realised that our job is to give all sides an equal hearing, but in cases of genocide you can't just be neutral. You can't just say, 'Well this little boy was shot in the head and killed in besieged Sarajevo and that guy over there did it but maybe he was upset because he had an argument with his wife.' No, there is no equality there and we had to tell the truth."

“In our reasonings concerning matter of fact, there are all imaginable degrees of assurance, from the highest certainty to the lowest species of moral evidence. A wise man, therefore, proportions his belief to the evidence.”