Opportunity lost

Like Andrew Gelman, I was disappointed with Steven Strogatz's recent column on Bayes Theorem for the New York Times. (Disclosure: I am a fan of Strogatz's Nonlinear Dynamics and Chaos book, and I'm pleased as punch that the New York Times is exposing readers to this type of scientific materials.)

Unlike Andrew, my displeasure has nothing to do with the fact that Bayes Theorem is behind the natural-frequency calculations. (Further disclosure: I used the same approach as Strogatz and Gigerenzer in Chapter 4 of Numbers Rule Your World, and in the Conclusion, connected the computation to Bayes Theorem. I didn't see it as "new thinking" though.)

I think the column exemplifies all that is wrong with the way our schools teach probability and statistics to nontechnical students. Much of the column is concerned with computation. The opportunity is lost on helping readers get behind the numbers, and interpret what's being computed.

***

If I were writing about mammogram screening in my book, or teaching it in my intro stats class, I'd say the following:

When asked about the accuracy of a mammogram, doctors cite the "false positive rate". Ignore the false positive rate, what patients really need to know is the "positive predictive value" (PPV), that is, the chance of having breast cancer given that one has a positive mammogram.

The PPV for mammography is 9 percent. Nine percent! You heard right. For every 100 patients who test positive, only 9 have breast cancer, the other 91 do not. This may sound like a travesty but it's easily explained: breast cancer is a rare disease, afflicting only 0.8% of women so almost all women who take mammograms do not have cancer, and even a highly accurate test when applied to large numbers of cancer-free women will generate a lot of false alarms.

The false positive rate is the chance of testing positive given that one does not have breast cancer. This number is irrelevant for patients because if you know you don't have breast cancer, you don't need to take the test! Doctors talk about it (they shouldn't but they do) because when they developed the mammogram, they needed to know how accurate the test is -- they needed to find a group of known positives and a group of known negatives, and then check if the test provided accurate results. Now that the test is rolled out and an annual rite for many women, doctors should be talking about PPV, not FP.

I don't see the point in dwelling on the A given B, B given A confusion. I'd just as well cover this by stressing the importance of asking the right question. Some students may find shocking juxtaposing the sensitivity of 93 90 percent (ability to detect positives) and the PPV of 9 percent. Again, this is easily explained since only 0.8% of those tested have breast cancer so however high the sensitivity is, it only affects a very small number of patients.

***

I'd stop there for Level 1. For more advanced students, there is much more to interpreting these numbers. Say:

This PPV of 9 percent sounds really low. Does this prove that the mammogram is worthless? Worse than flipping the coin? Well, the coin flip is not the right way to think about this. In the absence of the test, we know that 0.8% of women have breast cancer; given the test results, the PPV tells us that 9% of those testing positive have breast cancer so those testing positive are roughly 10 times more likely than average to have breast cancer. This demonstrates the value of the test. Is this high enough? That's for the experts to say but it is wrong to conclude that the test is worse than a coin flip.

Said differently, the mammogram provides valuable information. Prior to testing, we know X. A posteriori, we know more than X. Statisticians call 0.8% the prior probability of having cancer and 9% the posterior probability of having cancer given a positive test.

Despite this, why would doctors knowingly recommend a test in which 9 out of 10 positives will turn out to be false positives? To understand this, we need to know the "negative predictive value", that is, the chance that one does not have breast cancer given that one has a negative mammogram.

The NPV is 0.11 99.9%. That is to say, for those testing negative, they can be almost sure that they don't have cancer. It turns out that these two metrics (PPV and NPV) are linked; if doctors improve PPV, NPV must decline. What does this tell us about the incentives of those developing the screening test? Are they more afraid of false positives or false negatives? Why?

Now, we really get to what I consider the crux of the issue. The PPV and NPV reflect how the test has been calibrated. The calibration is only half science; the other half is human experience, politics, emotions, etc. I get deep into this stuff in Chapter 4.

***

In my view, classroom time (or reader time) is much better spent explaining these things than on the arithmetic of computing conditional probabilities. Knowing how to work the formulas but not noticing the real-life implications is a tragedy; knowing how to think about these tests but failing to compute probabilities is an inconvenience, and an excuse to befriend a quantitative mind.

Too often, we train our students to declare victory upon turning to the end of the textbook. And this is why I think Strogatz missed a great opportunity.

Comments

I have to disagree a bit here. I really like your explanations but I do not think it is really one or the other - this explanation OR the computations with A, B etc.

An intro class is for a wide population of students. Some of these students will be using stats in their other courses quite a bit (and likely often in their working lives). Others are simply fulfilling a requirement and only really "need" the statistical literacy part.

Teaching to both audiences requires a good prof to go over the thinking behind the numbers AND teach the computations. You may say a student can find that in the text, which is true, but many have trouble understanding the textbook's explanation of these computations.

I hadn't thought about emphasizing NPV, but that's a great idea for demonstrating tradeoffs. I also have my stats students calculate PPV as a function of disease prevalence so they can see how it varies among risk groups.

I completely agree that your description ends up teaching poeple more about conditional probabilities... but...

From the "curious" readers point of view (opposed to the "intrested" reader) Storgatz opinion peice is a much easier read. Why?

Because he doesnt get into technicalities or new acronyms. He just shows the curious something they probably were doing badly... And that makes a much better read to most of his readers (and probably the editor as well, IMHO).

Now if what the average NYT reader prefers to read is what is best for him to read? that's a whole diferent question.

JW: in my view, there should be two tracks for introductory classes, one to prepare students for hands-on statisical analyses, the other to prepare students to practice statistical thinking in their everyday and/or work lives, accepting the fact that they would never do any hands-on work.

If we treat the amount of class time as a scarce resource, then there is a tradeoff between time spent teaching formulas and time spent teaching interpretation and reasoning. Unfortunately, it's a tough choice.

The usefulness of a mammogram depends on what happens *after* the test. I assume the results show up some suspicously cancer-like tissue. If someone has a positive test, then the best bet would an additional, more precise test (such as a biopsy), to determine whether or not the person actually has cancer.
The big advantage of mammogram is that it is non-invasive, and have very few negative effects on the patient. So, patients have little to loose.

Any general screening program should have low costs, low risks to the patient, and a low false negative rate. A high flase positive rate or low predictive value is an acceptable price to pay, *providing* that any positive result is followed up with a more precise test.