December 2012

Innovation and NYC subway don't usually come together but something changed in the past year or so. One of the greatest life-changers has been the installation of countdown clocks in many of the stations, telling riders how long till the next (several) trains arrive. Now there is a smartphone app for this. (link)

***

Readers of Chapter 1 of Numbers Rule Your World learn the concept behind these countdown clocks as a "congestion management" tool. It turns out that the psychology of waiting is as important as the mathematics of queuing. A minute of waiting is not the same as a minute of waiting. For instance, a minute of waiting when one is uncertain as to how long the total wait would be is much more irritating than a minute of waiting when one knows how long the total wait would be. By installing such clocks, the MTA removes the unknown which causes anxiety.

My life has absolutely improved as a result of these clocks. I feel much calmer knowing when the next train will arrive. The most amazing part of this congestion management tool is that the actual waiting time has not changed at all but the riders perceive their waiting time to have shortened.

Chapter 1 looks at the problem of congestion and how statistical concepts help us understand and tackle the problem.

For the last few hours, Yahoo! decided that I'd be interested in reading this piece of news. Every time I go there, I get this front page:

I don't really know how this sort of studies gets published in journals, nor am I interested in spending an hour figuring out how they failed to prove it. The snippet summarizing the research is here.

The ability to look at some data and develop some sense of whether it makes sense is an important practical skill, especially these days when there is so much data being brandied around. There are quite a few things we already know without reading the research. This is not like a cholerstorol screening test for example, in which if the test shows high, there is medicine and diet changes that could bring down the level, and thus potentially prolong one's longevity. What is proposed here as a test does not lend itself to any kind of remedy. Also, I'm not sure if they proved predictive ability, or just proved a correlation. Sounds like the latter. Finally, even if true, my guess is that the death rate of the population they studied, aged 51-80, has relatively few deaths in a six-year follow-up period, making even small differences look huge on a ratio scale.

***

For those who don't have time to click, the test that has a 6.5 times (!!!) lift in predictive accuracy involves sitting and standing. Sorry to spoil your dreams of eternal life.

Nate Silver first attracted attention two election cycles
ago with the launch of his fivethirtyeight.com website (538 is the number of electoral
votes in the United States.) He makes clean charts, which I like a lot. Since that time, he has earned a platform on
the New York Times website, which goes some way to explaining the vitriol hurled
at him during the just-concluded election season by the right wing. Being in
the national conversation is convenient when one has a book coming off the
press – and I write this admiringly, as I believe Silver’s popularity and
influence further the cause of anyone in favor of the data-driven mindset. Surely,
the predictive success of his model, as well as those of a number of copycats,
has resoundingly humbled the pundits and tealeaf readers, who talked themselves
into ignoring the polling data.

The book is titled The Signal and the Noise (link). As explained by
Silver, these terms originated in the electrical engineering realm, and have
long served as a metaphor for the statistician’s vocation, that is, separating
the signal from the noise. Imagine making a long-distance call from California
to Tokyo. Your voice, the signal, is encoded and sent along miles of cables and
wires from one handset to the other, picking up interference, the noise, along
the way. The job of electrical engineers is to decipher the garbled audio at
the other end, by sizing and removing the noise. When the technology fails, you
have a “bad connection”, and you can literally hear the noise.

***

It is in the subtitle—“why so many predictions fail – but
some don’t”—that one learns the core philosophy of Silver: he is most concerned
with the honest evaluation of the performance of predictive models. The failure
to look into one’s mirror is what I often describe as the elephant in the data
analyst’s room. Science reporters and authors keep bombarding us with stories
of success in data mining, when in fact most statistical models in the social
sciences have high rates of error. As Silver’s many case studies demonstrate, these
models are still useful but they are far from infallible; or, as Silver would
prefer to put it, the models have a quantifiable chance of failing.

In 450 briskly-moving pages, Silver takes readers through
case studies on polling, baseball, the weather, earthquakes, GDP, pandemic flu,
chess, poker, stock market, global warming, and terrorism. I appreciate the
refreshing modesty in discussing the limitation of various successful prediction
systems. For example, one of the subheads in the chapter about a baseball player
performance forecasting system he developed prior to entering the world of
political polls reads: “PECOTA Versus Scouts: Scouts Win” (p. 88). Unlike many
popular science authors, Silver does not portray his protagonists as uncomplicated
heroes, he does not draw overly general conclusions, and he does not flip from
one anecdote to another but instead provides details for readers to gain a fuller
understanding of each case study. In other words, we can trust his conclusions,
even if his book contains little Freakonomics-style counter-intuition.

***

Performance measurement is a complex undertaking. To
illustrate this point, I list the evaluation methods deployed in the key case
studies of the book:

McLaughlin Group panel predictions (p. 49):
proportion of predictions that become “completely true” or “mostly true,”
ignoring predictions that cannot be or are not yet verifiable

Election forecasts (p. 70): proportion of
Republican wins among those districts predicted to be “leaning Republican”
(underlying this type of evaluation is some criterion for calling a race to be
“leaning”)

Baseball prospect forecasting (p. 90): number of
major-league wins generated by players on the Top 100 prospect list in
specified window of time; the wins attributed to individual players are
computed via a formula known as “wins above replacement player”

Daily high-temperature forecasts (p. 132): average
difference between predicted temperature (x days in advance) and actual
temperature relative to “naïve”
methods of prediction, such as always predicting the average temperature, or
predicting tomorrow’s temperature to equal today’s

Rainfall forecast (p. 135): how close to, say,
20%, is the proportion of days on which it actually rained, among those days
when the weather service forecasts 20% chance of rain

Earthquake forecast (p. 160): whether an
earthquake in the predicted range of magnitude occurred in the predicted range
of time at a predicted region of the world, or not (this is a binary outcome)

GDP growth forecast (p. 182): the proportion of
times in which the economist’s prediction intervals contain the actual GDP
growth

Chess (Ch. 9): winning games

Poker (p. 311): amount of earnings

Long-range
global temperature forecast (pp. 398, 402): actual trend against predicted
trend. (Note that this is the same method as #7 but with only one prediction
interval.)

If
you are thinking the evalution methods listed above seem numerous and arbitrary,
you’d be right. After reading Silver’s book, you should be thinking critically
about how predictions are evaluated (and in some cases, how they may be
impossible to verify). Probabilistic forecasts that Silver advocates are even
harder to validate. Silver tells it like it is: this is difficult but crucial
work; and one must look out for forecasters who don’t report their errors, as
well as those who hide their errors by using inappropriate measurement.

***

Throughout the book, Silver makes many practical recommendations
that reveal his practitioner’s perspective on forecasting. As an applied
statistician, I endorse without hesitation specific pieces of advice, such as
use probability models, more data could make predictions worse, mix art and
science, try hard to find the right data, don’t just use readily available data,
and avoid too much precision.

The only exaggeration in the book is his elevation of
“Bayesian” statistics as the solution to predictive inaccuracy. What he
packages as Bayesian has been part of statistical science even before the
recent rise of modern Bayesian statistics. (The disagreement between Bayesians
and non-Bayesians is over how these concepts are utilized.) Silver’s exposition
focuses on probability updating in sequential decision-making, which is
understandable given his expertise in sequential settings with a rich tradition
of data collection, such as baseball and polling. (At one point, he makes an
astute comment about data analysts selecting more promising settings in which
to work.) The modern Bayesian movement is much broader than probability
updating, and I’d point you to Professor Andrew Gelman’s blog and/or books as a place to
explore what I mean by that. It must be said, though, that the technicalities
of Bayesian statistics are tough to convey in a mass-market book.

***

In spite of the minor semantic issue, I am confident my
readers will enjoy reading Silver’s book (link). It is one of the more balanced,
practical books on statistical thinking on the market today by a prominent public
advocate of the data-driven mindset.