We use tools from mathematics, physics, and statistics to answer questions about how organisms evolve, power themselves, and live in an unpredictable world. We are particularly interested in scientific questions that can help people -- from medicine to crop engineering. And we're very good buddies with Systems & Signals, with whom much research is shared and fun is had!

Wednesday, 27 January 2016

ARTICLE: Great technological power, great statistical responsibility

Several papers perform incorrect and misleading statistical analyses in seeking links between mtDNA and cancer: these statistical issues must be corrected before scientific and policy progress can be made from these investigations

Biologists often report a result as a "significant" sign of exciting new science if there is less than a 1-in-20 chance that the result they observe could have emerged by chance from boring old science. This is silly (although we do it too!) -- by contrast, for example, physicists require less than a 1-in-3,500,000 chance. But this post won't discuss too many problems with this state of affairs -- that is done admirablyelsewhere.

The problem can be compounded when scientists take lots of measurements. Say we take 50 measurements of a boring old system, and every time we see something that has less than a 1-in-20 chance of appearing in a boring old system, we call it "significant". We're playing the odds 50 times, so we expect to see 1-in-20 results appear around 2 or 3 times; just as if we roll a dice 50 times, we'd expect to roll a good few sixes. If we call every 1-in-20 result "significant" without accounting for the fact that we've looked at lots of measurements (and are thus more likely to see 1-in-20s by chance), we are in danger of reporting exciting new science when in fact the boring old science has been true all along.

There are lots of ways of doing this accounting, but a series of papers that have been recently published linking mtDNA to diseases have made no attempt to do it. Generally, these papers look at the mtDNA of people without the disease and the mtDNA of people with the disease. If any mtDNA features appear more in the people with the disease, the paper calculates the chance of that difference occurring in the boring old picture (in which there is no link between the mtDNA feature and the disease). If they drop below the 1-in-20 mark, they report an exciting new link between that feature and the disease. But they test dozens of features and never account for this multiple testing -- so, as above, we'd expect them to see "significant" results emerging just by chance. In a paper in Mitochondrial DNA here (free here) I show, by creating artificial data, that this problem is rife, that most of these reported links are spurious, and that scientists really need to be more responsible, before their flawed analysis starts to misguide health policy and medicine.

The top graph shows how the probability of seeing a 1-in-20 occurrence (p < 0.05 in the jargon), when in fact there is nothing new and exciting to report, increases as a scientist investigates more things. If an experiment consists of one test, then a 1-in-20 occurrence indeed has a 1-in-20 probability (0.05). But as soon as we do more tests, the chance of seeing at least one 1-in-20 occurrence starts to increase, as we are "playing the game" more times. If we do 6 tests there is a 0.27 probability -- between a 1-in-4 and 1-in-3 chance -- that we will see at least one 1-in-20 event. This is illustrated below, where we have six dice and think some of them may be unfair. We roll each one five times and count the number of 6s. One of them comes up 6 three times -- the chance of this happening for one fair die is less than 1-in-20. But because we've looked at six dice, we should be less surprised to see this rare event, because we've looked at more events in total. We need more evidence to claim that this die is unfair.

This quick note only represents the tip of the iceberg. MtDNA studies are often statistically unsound; statistical misdemeanours in biomedical studies are so common that most published research is wrong; scientists increasingly focus on the 1-in-20 chance as opposed to the size and importance of the effect they're measuring; the majority of hallmark papers in vital fields like cancer science are unreproducible (though this last point may have other causes than statistical problems). The 1-in-20 idea was only ever meant to be a step in identifying interesting scientific avenues, not the final measure of scientific truth. This is a big, and growing, problem! Iain

(For accessibility I have used "exciting", "boring", and "1-in-20" instead of their usual, more technical labels; they of course are usually called the "alternative hypothesis", "null hypothesis", and "p < 0.05" respectively).