In each case, I analyze studies that seem to “prove” these surprising findings, and identify the errors (cherry-picking data, inadequate statistical adjustment, and fishing expeditions) that lead to these conclusions. Basically, it’s a “how to assess medical research” primer. Read the whole thing here.

I don’t mean to fixate on the topic of running form, but I just want to recommend Gina Kolata’s piece in today’s New York Times, which reads a lot like a response to Chris McDougall’s recent piece (and echoes much of what I wrote a couple of days ago). Her basic point: people are desperate for advice that will “explain” how to run, when in reality the biggest is barrier is simply that running is hard (especially for people who have been inactive for years) and takes more time to adapt to than most people expect.

Researchers who have no financial ties to running programs or shoe manufacturers say that most of those complications are unnecessary and some of the advice is even risky, because it can make running harder and can increase the chance of injury. [...]

“There is good evidence that your body is exquisitely lazy and will find the easiest way for you to run,” said Carl Foster, professor of exercise and sports medicine at the University of Wisconsin-La Crosse. [...]

Running form is just one example of the confusions buffeting beginning runners. Running, said John Raglin, professor of kinesiology at Indiana University, “is so prone to these sorts of trends.”

People “will latch onto anything,” he added, and an anecdote or two about what is supposed to be an ideal running form often passes for evidence.

Kolata’s articles can sometimes seem a little nihilistic, as she writes about the surprising lack of evidence for very common treatments in sports medicine and physiotherapy, and common practices like warming up. The point isn’t that we shouldn’t do anything that isn’t “evidence-based” — life is complicated, and we inevitably have to make lots of decisions armed only with imperfect knowledge. But we should be aware of that, and not mistake our current guesses and hypotheses for “the one true way.”

I have a new article over at the Men’s Health website that tries to explain a bit about how the process of “statistical adjustment” works in large observational studies, and what some of the pitfalls are. My favourite example from the piece: a 2009 study of over half a million people that found that red meat intake is associated with increased risk of death — even after “adjusting” for potentially confounding factors like age, education, race, BMI, smoking, exercise, vegetable consumption and so on. Needless to say, that study got lots of press at the time. But when you dig into the study’s stats (as Stanford prof Kristin Sainani pointed out to me), you find out that red meat also increases your risk of sudden accidental death from causes like car crashes and guns!

“Unless red-meat eaters are swerving to avoid cows, that doesn’t make any sense,” Sainani says. Instead, the most likely explanation is that red-meat eaters take more risks in other areas of life. But the study didn’t collect any data on the driving habits and gun collections of its volunteers, so the researchers were unable to adjust for these factors—and as a result, the conclusions were skewed.

Another interesting nugget is a rough estimate of the size of effect needed to be fairly sure you’re not seeing the effects of residual confounding, according to statistical simulations:

Bad data can easily generate an apparent risk increase of up to 60 percent, according to a research paper on statistical adjustment published in the journal PM&R. Effects bigger than that are very difficult to explain without serious errors in the design of the study.

The confidence we experience as we make a judgment is not a reasoned evaluation of the probability that it is right. Confidence is a feeling, one determined mostly by the coherence of the story and by the ease with which it comes to mind, even when the evidence for the story is sparse and unreliable. The bias toward coherence favors overconfidence. An individual who expresses high confidence probably has a good story, which may or may not be true.

[...] When a compelling impression of a particular event clashes with general knowledge, the impression commonly prevails.

The main example he discusses in the excerpt is the world of finance — many, many people (including just about everyone I know, seemingly) are convinced that they or their financial advisors are capable of outperforming the market, despite ample evidence that this is nearly impossible to do on a consistent basis. But the good stock picks they’ve made over the years make such a vivid impression that they remain convinced of their abilities.

The reason I’m blogging about this here is that I think this phenomenon is also nearly universal when it comes to health and fitness. Of course, there are many people who either don’t believe in or don’t understand the scientific method. They trust their instincts in figuring out which potions and pills are helping them in vague and unquantifiable ways. This is not surprising at all. What is surprising to me is the number of people who understand and profess belief in the scientific method, who murmur all the right catchphrases about “correlation is not causation” and “of course n=1 anecdotes don’t mean anything,” and yet are still absolutely convinced of their ability to determine which stretch has enhanced their power or saved them from injury, or which pill makes them feel more energetic, or which type of training has enhanced their lactate clearance.

There is some good news at the end of Kahneman’s excerpt: it is possible to have real intuitive expertise. (“You are probably an expert in guessing your spouse’s mood from one word on the telephone,” he notes. Chess players and medical diagnosticians are other example.) But there’s a necessary condition:

Is the environment in which the judgment is made sufficiently regular to enable predictions from the available evidence? The answer is yes for diagnosticians, no for stock pickers.

Maybe I’m just a particularly complicated human, or unusually incapable of reading my body’s signals. But given the huge number of factors, both intrinsic and extrinsic, that affect the day-to-day variation in my mood, energy and physical performance, I don’t consider my own body “sufficiently regular” to be able to make accurate judgements about the efficacy of any particular single intervention.

I posted a couple of days ago about some famous research showing that the “hot hand” in basketball is (apparently) an illusion. As one of the researchers, Amos Tversky, said:

I’ve been in a thousand arguments over this topic, won them all, but convinced no one.

The reason that story interested me was because I’d been thinking about similar issues in relation to the recent news that a U.S. medical panel has recommended against routine screening for prostate cancer for healthy men. A couple of recent articles in the New York Times have explored the rationale behind this recommendation, and why our minds are ill-equipped to weigh the pros and cons in these cases. As Daniel Levitin wrote in a review of Jerome Groopman and Pamela Hartzband’s new book “Your Medical Mind”:

Yet studies by cognitive psychologists have shown that our brains are not configured to think statistically, whether the question is how to find the best price on paper towels or whether to have back surgery. In one famous study, Amos Tversky [yes, the guy who did the "hot hand" study -AH] and Daniel Kahneman found that even doctors and statisticians made an astonishing number of inference errors in mock cases; if those cases had been real, many people would have died needlessly. The problem is that our brains overestimate the generalizability of anecdotes… The power of modern scientific method comes from random assignment of treatment conditions; some proportion of people will get better by doing nothing, and without a controlled experiment it is impossible to tell whether that homeopathic thistle tea that helped Aunt Marge is really doing anything.

I actually got into a bit of debate at (Canadian!) Thanksgiving dinner a few nights ago about prostate screening. Two of my elder relatives have had prostate cancer, and undergone the whole shebang — surgery, radiation, etc. One of them, in particular, was highly critical of the recommendation not to be screened. It had saved his life, he said. How did he know?, I asked. He just knew — and he knew dozens of other men in his survivors’ support group who had also been saved.

I feel bad about having argued over what is clearly a very emotional topic. Nonetheless, I’m ashamed to admit, I did try to explain the concept of “number needed to treat.” Surely everyone would agree that, say, taking a million men and amputating their legs wouldn’t be worthwhile if it saved one man of dying from foot cancer, right? And if you accept that, then you realize that it’s not a debate about the absolute merit of saving lives — it’s a debate about weighing the relative impact and likelihood of different outcomes. That’s captured very nicely in another NYT piece, by a professor of medicine at Dartmouth, Gilbert Welch, who compared the similarities between breast cancer screening (recommended) and prostate cancer screening (not recommended):

Overall, in breast cancer screening, for every big winner whose life is saved, there are about 5 to 15 losers who are overdiagnosed [i.e. undergo treatments such as surgery, radiation, etc.]. In prostate cancer screening, for every big winner there are about 30 to 100 losers.

So what’s the message here? Is it worth putting 100 men through surgery to extend one man’s life? How about 30 men? (The average extension of life after prostate surgery is six weeks.) Of course, there’s no “right” answer. As Welch writes, reasonable people can reach different conclusions based on the same input data. In the end, you have to assess the odds and the stakes, and choose how you want to gamble. But before you do, you should at least understand what those odds are — and that means taking a good luck at number needed to treat and other ways of assessing treatments.