Posts categorized "Environment"

Chapter 1 of Numbersense(link)uses the example of U.S. News ranking of law schools to explore the national pastime of ranking almost anything. Since there is no objective standard for the "correct" ranking, it is pointless to complain about "arbitrary" weighting and so on. Every replacement has its own assumptions.

A more productive path forward is to understand how the composite ranking is created, and shine a light on the underlying assumptions.

***

The New York Times recently published an article entitled "What's the Matter with Eastern Kentucky?" (link). The problem with Eastern Kentucky, as the reporter saw it, is that those counties rank at the bottom of their list. Here is their ranking methodology:

The team at The Upshot, a Times news and data-analysis venture, compiled six basic metrics to give a picture of the quality and longevity of life in each county of the nation: educational attainment, household income, jobless rate, disability rate, life expectancy and obesity rate. Weighting each equally, six counties in eastern Kentucky’s coal country (Breathitt, Clay, Jackson, Lee, Leslie and Magoffin) rank among the bottom 10.

There is a companion blog at The Upshot, giving more context, and a county-level map of the ranking (link). Here are the relevant sentences.

The Upshot came to this conclusion by looking at six data points for each county in the United States: education (percentage of residents with at least a bachelor’s degree), median household income, unemployment rate, disability rate, life expectancy and obesity. We then averaged each county’s relative rank in these categories to create an overall ranking.

(We tried to include other factors, including income mobility and measures of environmental quality, but we were not able to find data sets covering all counties in the United States.)

We used disability — the percentage of the population collecting federal disability benefits but not also collecting Social Security retirement benefits — as a proxy for the number of working-age people who don’t have jobs but are not counted as unemployed.

How should we read this article?

***

What is this a ranking of? What is the research question? The answer is "how hard it is to live in specific counties". Right away, we know any answer is subjective, even if data is proffered.

Look out for the relative weights. The authors tell us it's equally weighted. "Equal weighting" implies fairness but frequently hides the inequity. Are those six factors equally important? Are there strong correlations among some of those factors?

The blog post discloses that each of the six metrics is first converted to ranks before being averaged. This means we need to worry about how much each metric vary from county to county. Take obesity rate for example. Here is a map of obesity at the county-level published by the CDC, based on a model estimate (link).

The people who made this map placed the counties into five groups. The middle groups are narrowly defined, for example, 29.2% to 30.8%. Any analyst who converts the county-level obesity rates to ranks makes over 3000 gradations of obesity rate. Said differently, the worst county is rated as over 3000 times worse than the best county. In the case of obesity, the medical community would consider most of these counties unhealthy.

This is an example that shows too much granularity hurts you, a core insight of statistics that may seem counterintuitive.

***

Ultimately, it's for you to decide whether you believe this ranking makes sense or not. I'm not here to dismiss it because as I said in Numbersense (link), you can replace this methodology with something else, but the new method will also have its own assumptions.

Nate Silver first attracted attention two election cycles
ago with the launch of his fivethirtyeight.com website (538 is the number of electoral
votes in the United States.) He makes clean charts, which I like a lot. Since that time, he has earned a platform on
the New York Times website, which goes some way to explaining the vitriol hurled
at him during the just-concluded election season by the right wing. Being in
the national conversation is convenient when one has a book coming off the
press – and I write this admiringly, as I believe Silver’s popularity and
influence further the cause of anyone in favor of the data-driven mindset. Surely,
the predictive success of his model, as well as those of a number of copycats,
has resoundingly humbled the pundits and tealeaf readers, who talked themselves
into ignoring the polling data.

The book is titled The Signal and the Noise (link). As explained by
Silver, these terms originated in the electrical engineering realm, and have
long served as a metaphor for the statistician’s vocation, that is, separating
the signal from the noise. Imagine making a long-distance call from California
to Tokyo. Your voice, the signal, is encoded and sent along miles of cables and
wires from one handset to the other, picking up interference, the noise, along
the way. The job of electrical engineers is to decipher the garbled audio at
the other end, by sizing and removing the noise. When the technology fails, you
have a “bad connection”, and you can literally hear the noise.

***

It is in the subtitle—“why so many predictions fail – but
some don’t”—that one learns the core philosophy of Silver: he is most concerned
with the honest evaluation of the performance of predictive models. The failure
to look into one’s mirror is what I often describe as the elephant in the data
analyst’s room. Science reporters and authors keep bombarding us with stories
of success in data mining, when in fact most statistical models in the social
sciences have high rates of error. As Silver’s many case studies demonstrate, these
models are still useful but they are far from infallible; or, as Silver would
prefer to put it, the models have a quantifiable chance of failing.

In 450 briskly-moving pages, Silver takes readers through
case studies on polling, baseball, the weather, earthquakes, GDP, pandemic flu,
chess, poker, stock market, global warming, and terrorism. I appreciate the
refreshing modesty in discussing the limitation of various successful prediction
systems. For example, one of the subheads in the chapter about a baseball player
performance forecasting system he developed prior to entering the world of
political polls reads: “PECOTA Versus Scouts: Scouts Win” (p. 88). Unlike many
popular science authors, Silver does not portray his protagonists as uncomplicated
heroes, he does not draw overly general conclusions, and he does not flip from
one anecdote to another but instead provides details for readers to gain a fuller
understanding of each case study. In other words, we can trust his conclusions,
even if his book contains little Freakonomics-style counter-intuition.

***

Performance measurement is a complex undertaking. To
illustrate this point, I list the evaluation methods deployed in the key case
studies of the book:

McLaughlin Group panel predictions (p. 49):
proportion of predictions that become “completely true” or “mostly true,”
ignoring predictions that cannot be or are not yet verifiable

Election forecasts (p. 70): proportion of
Republican wins among those districts predicted to be “leaning Republican”
(underlying this type of evaluation is some criterion for calling a race to be
“leaning”)

Baseball prospect forecasting (p. 90): number of
major-league wins generated by players on the Top 100 prospect list in
specified window of time; the wins attributed to individual players are
computed via a formula known as “wins above replacement player”

Daily high-temperature forecasts (p. 132): average
difference between predicted temperature (x days in advance) and actual
temperature relative to “naïve”
methods of prediction, such as always predicting the average temperature, or
predicting tomorrow’s temperature to equal today’s

Rainfall forecast (p. 135): how close to, say,
20%, is the proportion of days on which it actually rained, among those days
when the weather service forecasts 20% chance of rain

Earthquake forecast (p. 160): whether an
earthquake in the predicted range of magnitude occurred in the predicted range
of time at a predicted region of the world, or not (this is a binary outcome)

GDP growth forecast (p. 182): the proportion of
times in which the economist’s prediction intervals contain the actual GDP
growth

Chess (Ch. 9): winning games

Poker (p. 311): amount of earnings

Long-range
global temperature forecast (pp. 398, 402): actual trend against predicted
trend. (Note that this is the same method as #7 but with only one prediction
interval.)

If
you are thinking the evalution methods listed above seem numerous and arbitrary,
you’d be right. After reading Silver’s book, you should be thinking critically
about how predictions are evaluated (and in some cases, how they may be
impossible to verify). Probabilistic forecasts that Silver advocates are even
harder to validate. Silver tells it like it is: this is difficult but crucial
work; and one must look out for forecasters who don’t report their errors, as
well as those who hide their errors by using inappropriate measurement.

***

Throughout the book, Silver makes many practical recommendations
that reveal his practitioner’s perspective on forecasting. As an applied
statistician, I endorse without hesitation specific pieces of advice, such as
use probability models, more data could make predictions worse, mix art and
science, try hard to find the right data, don’t just use readily available data,
and avoid too much precision.

The only exaggeration in the book is his elevation of
“Bayesian” statistics as the solution to predictive inaccuracy. What he
packages as Bayesian has been part of statistical science even before the
recent rise of modern Bayesian statistics. (The disagreement between Bayesians
and non-Bayesians is over how these concepts are utilized.) Silver’s exposition
focuses on probability updating in sequential decision-making, which is
understandable given his expertise in sequential settings with a rich tradition
of data collection, such as baseball and polling. (At one point, he makes an
astute comment about data analysts selecting more promising settings in which
to work.) The modern Bayesian movement is much broader than probability
updating, and I’d point you to Professor Andrew Gelman’s blog and/or books as a place to
explore what I mean by that. It must be said, though, that the technicalities
of Bayesian statistics are tough to convey in a mass-market book.

***

In spite of the minor semantic issue, I am confident my
readers will enjoy reading Silver’s book (link). It is one of the more balanced,
practical books on statistical thinking on the market today by a prominent public
advocate of the data-driven mindset.

The NYT has a nice article about the challenges of predicting hurricane intensity. A researcher pointed out it's difficult to get inside a storm to measure wind speeds and so:

it is not enough data to plug into a numerical model and yield a forecast that has a high degree of certainty.

I had the TV on most of the weekend, and there was around-the-clock coverage of Irene. I did not hear a single instance in which the forecaster (or broadcaster) provided information on the level of uncertainty of any of the predictions they were giving out. This would typically involve talking about probabilities or margins of error.

In other words, the experts admit that their forecasts are highly uncertain but the news report them as if they are 100% certain. We keep hearing comments such as "Irene will remain a Category 1 hurricane when it arrives in New York around Sunday morning and the maximum wind speed is expected to be around 70 mph."

***

The experts tell us the wind speed number is highly uncertain, and they quantify this uncertainty. Look at this Friday evening release from the National Hurricane Center, for example.

In the section labeled "wind speed probability table for selected locations", I looked up New York City. There was only a 5 percent chance of "sustained wind speeds (1 minute average)" of over 74 mph while there was a 56 percent chance that the wind speeds would be below 39 mph!

The press release is difficult to read, requiring readers to translate time, speeds, etc. to normal scales. However, if you are on the hurricane beat, there is no excuse for not spending the time to understand these data and conveying this information to citizens.

***

As discussed in Chapter 1 of Numbers Rule Your World, how much things vary around the average value is very important information not to be missed.

[Update, Sep 2]

Larry Cahoon, a reader of the blog and the book, emailed me about this post, and he makes several good points. In my original post, I didn't make these clear. It is in the hurricane wind speeds that the reporting failed; when it comes to the projected track of the storm, the reports typically include information on uncertainty. In addition, I'm complaining about the reporting of the science, not the science itself. Here, in Larry's words:

I read you piece on the error reporting for Hurricane for Irene and would say you have missed much of the reporting or the quality for the forecasts for the storm.

Perhaps I see more of the quality stuff as my main source for hurricane watching is the National Weather Service through http://www.hwn.org/ . The discussion piece put out with each forecast is particularly interesting and reveals the weather services concerns with their forecasts in considerable detail. The error in forecast track is plotted at http://hwn.org/stormpulse_atlantic.html and is easy to see. This is picked up by many in the media and is sometimes referred to has the "Hurricane Cone."

The other area where the error in the forecast track shows up in the plots of the "spaghetti models" which seem to have been pickup up by a number of media outlets and even the major networks. A Google search of the phrase will give you many hit and some go into greater detail and discuss the impact of some of the forecast tracks.

There were areas where the media here in MD failed - mainly on wind forecasts that seemed to assume that the winds were of a uniform speed at a given distance from the storm regardless of which direction from the center of the storm someone was located. That was clearly contrary to what the weather service tell everyone.

What I have failed to see are good discussions of the accuracy of the wind speed forecasts. Although I did see a piece in the last two days discussing the National Hurricane Center's concerns with the quality for forecasts they are able to give for wind speeds.

David Ropeik wrote in the Guardian about our irrational fears and misperception of risks. Worth reading (link). He asks: Why, if the actual risk for any given person is so low, does it feel so scary to so many?

Russia tried to ban all vegetable imports from the EU, which is a horrible idea. It is almost for sure that by this time, the contaminated batch of greens has been completely deprecated so any such measures are no more than PR stunts. Apparently, the EU convinced Russia to lift the ban (link).

Because of fear and, I must say, the lack of leadership to tackle the fear, quite a bit of unnecessary economic losses have been suffered. The EU estimates that farmers will lose $300 million.

Also, don't forget the amount of uncontaminated produce that has been laid to waste.

***

One other point: the level of risk is not the same for everyone. Most E-coli fatalities in past outbreaks have been elderly women or children with already compromised immune systems. In this case, 13 of 19 deaths were adult women, a little unusual but still a concentration of risk among a subset of the population. (link)

Just like a lot of situations, the "average" risk is not useful here. It's important to know if you are in the high-risk subgroup or not.

This is the promised second post in reaction to Phil's piece on Andrew's blog about dealing with dirty, complex climate data. In a prior post, I considered the issue of a perverse incentive in data processing, and showed how it also affects credit reporting and scoring.

***

At the end of his post, Phil surfaces a topic that will clearly irk some -- when there is a gap between the data and the model, should one fix the data or fix the model?

Since I wrote about this topic here as it relates to predicting Olympic medals, in a post called "False belief in true models", you might understand that my first reaction was: fix the model, the data is reality! By contrast, Phil indicated that it is often prudent for climate scientists to fix the data to bring it closer to the model. How might one reconcile the two points of views?

The reason for my post on "false belief in true models" was my displeasure with many business and economics folks who talk incessantly about over or under performance relative to a "model". For example, the employment statistic did better than "expected" even though the growth in jobs did not keep up with population growth, meaning the nation was worse off. This type of statement is tantamount to saying the model is always true, but a statistical model can never be true.

Then, Phil brought up this scary prospect:

The models are close to being correct. In this case, gross discrepancies between data and models will indicate problems with the data. Fixing those problems will lead to data that are in better agreement with the models.

In effect, he is saying Data is Not Reality. Uh oh.

He explained further:

When the data are complicated ... then it's not necessarily a surprise to find problems with the data, and to find that when those problems are fixed, the result is better agreement with a model.

I fully understand what he means. The data environment faced by climate scientists is many orders of magnitude more complex than for say businesses. If I need to count the number of gadgets sold through a website, these transactions are recorded, and the data is relatively clean. Climate data is extremely hard to collect... Phil talked about thermometers installed on 3000 undersea robots, for example. The errors, such as forgetting to notice that satellites were moved closer to earth, are not easy to catch since presumably the data analysts were not the ones ordering the satellite locations to change.

***

In other words, adjusted data is reality. Unadjusted data is not. I should have made this clear. In statistical analysis, the first step is to inspect the data, and correct any errors. It is best if such data cleansing is completed prior to the analysis; what Phil appears to be saying is that some of these errors are so subtle that it is only when compared to a reasonable model that they would come to light.

At the end of his thoughtful piece, I again feel that climate scientists are not giving themselves enough credit. I don't think it is correct to describe the data cleansing activity as "bringing the data closer to the model". Instead, he should describe it as correcting obvious errors in the data, or reducing measurement error.