The article is about the Joint Statistical Meetings in Washington this week and how the job market for statisticians is booming. This is said to be a product of the recent explosion of digital data. Here is the accompanying picture:

Formulated by Godwin way back in 1990 when the Internet was truly the domain of geeks, [Godwin's law] states that "As an online discussion continues, the probability of a reference or comparison to Hitler or to Nazis approaches 1.

Rick Casey, Houston Chronicle, August 14, 2009

This illustrates why the controversy over statistical significance is exaggerated. Whether you consider the first or second analysis, the observed effect of the Thai candidates was either just above or below the level of statistical significance. Statisticians will tell you it is possible to observe an effect and have reason to think it’s real even if it’s not statistically significant. And if you think it’s real, you ought to examine it carefully.

Forsooth

Life expectancy at birth has risen to a new high, now standing at nearly 78 years. The increase is due mainly to falling rates in almost all the leading causes of death.

This quotation appeared in The Hartford Courant, August 20, 2009, credited to AP. It appears to be based on a NYT article [1], which in turn is based upon a National Vital Statistics Report [2].

A brief Wall Street Journal article [3] summarized a JAMA report about two recent studies of the relationship between the Mediterranean Diet and cognitive decline. Apparently an (undated) older study had suggested the existence of a relationship.

One of the two recent studies involved 2,000 elderly people and found that “those who adhered more closely to a Mediterranean diet … had less risk of developing [dementia].” The other study, from France, involved 1,410 people and found that “[a]dherence to the diet didn’t change the risk of dementia.”
JAMA’s editorial conclusion:

[A]ll told, there is “moderately compelling evidence that adherence to the Mediterranean-type diet is linked to less late-life cognitive impairment.”

Thomas Bayes … was … also the mathematician who formulated a probability theorem that can be used to solve problems that stymie conventional statistics. The crux of his theorem can be stated as follows:
The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the value of the thing expected upon its happening.

Many countries could see swine flu cases double every three to four days for several months until peak transmission is reached, once cold weather returns to the northern hemisphere, said WHO's Western Pacific director, Shin Young-soo.

Fraud Doesn't Always Happen to Someone Else

From a Wall Street Journal article [4] describing the results of a number of research studies about victims of investment fraud:

The typical investment-scam victim is an optimistic married man in his later 50s who has a higher-than-average knowledge of financial matters and deep confidence in his own judgment ….

[A man told] an FTC fraud forum that he preferred speaking with a man because "you can lather him up and push all the green buttons." Women were more cautious and asked too many questions, he said, prompting an office maxim, "Don't pitch to the b—."

Kuklo's Fellow Infuse Worker

From The Pioneer Press we learn that there is more to the Kuklo story. "Dr. David Polly, the University of Minnesota spine surgeon ... received nearly 1.2 million dollars in consulting fees from medical device giant Medtronic over a five-year period." The details "of Polly's billing records were released this week by Sen. Charles Grassley, R-Iowa, as an attachment to a letter to University of Minnesota President Robert Bruininks. The letter raised questions about how the U polices conflicts of interest among doctors."

Polly's recordkeeping was indeed detailed:

Download CDs from meeting, 15 minutes, 125 dollars

Dinner meeting, 240 minutes, 2,000 dollars

E-mail Medtronic employee, five minutes, $49.48

Conference call, 90 minutes, 890.63 dollars

Teach at scoliosis meeting, 330 minutes, 2,750 dollars

According to the newspaper, Dr. Charles Rosen, a spine surgeon in California who leads a medical ethics group, said he was among those surprised by the details.

"I've not seen anybody bill the way he did," said Rosen, of the University of California-Irvine, who acknowledged that he doesn't do paid consulting work with the device industry.

"In my opinion, it sounds more like an investment banker," he said of the detailed billing. "It doesn't sound like someone in medicine."

Defining a clunker

Temporarily substituting for Carl Bialik (The Numbers Guy), Forelle reports about the government’s cash-for-clunkers program and critiques the EPA’s recently revised definition of a clunker.

The EPA stated that “more precise” data calculated “to four decimal places” led it to revise its miles-per-gallon cutoff figures.

"It is ludicrous to suggest that you can get fuel-consumption accuracy anywhere past the first decimal place, let alone the second," says … an independent U.K. auto tester.

Forelle discusses the “faux precision” of estimates that are often based on sampling, but reported as final counts or measurements without their sometimes large margins of error, as in the case of population or unemployment-rate estimates.

He cites another issue involved in misleadingly precise estimates, that is, lack of adherence to conventional rules relating to the issue of significant figures in arithmetic.

The principle is simple: When combining measured numbers, the final answer is only as precise the least-precise piece of data that went into it; you can't just add a tail of decimal places, even if they show up on the calculator. So a room that's 2.5 meters (two significant digits) by 3.87 meters (three) has an area of 9.7 square meters, though the two numbers multiply to 9.675.

Fuel mileage calculations are apparently based upon tailpipe emissions of carbon dioxide, because released carbon dioxide from burning fuel provides a more accurate measure of gas consumption than direct measurement of consumed fuel. Not only does the EPA believe that the results of two lab tests on each car must be recorded to four decimal places by law, but it also added tests that were not done on older cars, and “created a formula that estimated from the old data what would happen had the new tests been run,” this despite the different precision levels of numbers that went into the formulas. An EPA spokesman said, "Repeatability and accuracy is something we spend a lot of time on."

Discussion

1. What’s the difference between accuracy and precision?

2. How do you count significant digits in numbers, or report correctly in arithmetic results? (Two websites, un-vetted by contributor, may be of interest [5] or [6].)

3. A blogger commented [7], “Regards your room area example, if both length and width are measured by a person who makes the same direction of error on each measurement -- so that both are either too high or too low -- then the area will not only have almost twice the percentage error of either measurement, it will, on average be too high.” Do you agree with all, or part of, this statement?

4. The author described a 1991 court case in which an Alaskan man failed a bar exam and “missed by 0.5 point the threshold needed for a re-evaluation of his test.” The man claimed that, since the essays were graded with integers, his score should have been rounded up to the next integer. Although the man lost the case, the Alaska Supreme Court found his argument “convincing from a purely mathematical standpoint.” A blogger argued [8] that there are an infinite number of significant digits in counts (e.g., 1.000…), because “the error in the value of these numbers is ZERO,” and so arithmetic results “can be rounded to as many sig figures as you want to.” Do you agree with the Alaskan man or with the blogger?

All that jazz

Based on the National Endowment of the Arts’ 2008 Survey of Public Participation in the Arts [9], popular interest in jazz on the part of adult Americans appears to be experiencing a serious decline. The study was conducted “in participation with” the U.S. Census Bureau.

Several causes for concern about the future of jazz are a general decrease in attendance in at least one jazz performance per year (down from about 11% to 8% for the period 2002-2008) and an increase in the median age of those who do attend (up from 29 to 46 years for the period 1982-2008).

In his book, Leinweber “dissects the shoddy thinking that underlies most of [the quantitative] techniques” in use today, and he refers to data-mined numbers as “one of the leading causes of the evaporation of money.”

Zweig describes how Leinweber decided to satirize data mining with an example, meant to be a joke. He found that annual butter production in Bangladesh “explained” 75% of the variation in the annual returns of S&P 500-stock index over a 13-year period.

By tossing in U.S. cheese production and the total population of sheep in both Bangladesh and the U.S., Mr. Leinweber was able to "predict" past U.S. stock returns with 99% accuracy.

Leinweber has advice for avoiding “falling into a data mine”: (a) Check that the results make sense; (b) Check that the claim still holds for smaller subsets of the data; (c) Check the results after costs, fees, and taxes are subtracted; (d) Wait to see if the claim continues to hold true as time goes by.

Students may enjoy a 4-minute video [12] of Jason Zweig interviewing Nerds author David Leinweber. Or they may be interested in YouTube videos (8 minutes, in four parts) [13] ] of a lecture by David Leinweber.

Discussion

The article contained a chart [14] showing the “Correlation of Super Bowl wins by original NFL teams with positive return[s] for the S&P 500.” The bar chart shows the S&P 500 return for each year 1967 through August 6, 2009, with 32 blue bars for years in which a “correlation held” and 11 red bars for years in which it did not.

1. What do you think the chart’s author meant by stating that a “correlation held”? What else would you like to know about his/her “correlation”?

2. Suppose there were a positive correlation, even a relatively high one, between NFL Super Bowl team wins and positive S&P 500 returns. Would you be surprised that Super Bowl wins did not predict positive S&P returns for every one of the years?

Football: How about a “time-in”?

The author, a postdoctoral fellow at Harvard Medical School, proposes a new football rule - the “time-in” - which would force the game clock to resume. He would limit its use to once per game per team.

The possibility of a sudden time-in would loom large in every coach’s mind at the most tense points in the game, introducing just enough concern and uncertainty to make the game different.

On his website Advanced NFL Stats [15], Brian Burke has developed a mathematical model for computing the probability of victory for a team during a game. According to Burke, the time-in would “only be used by the leading team on defense near the end of the game when there’s a small point difference.” In that case, his prediction is that the time-in “could produce up to a one-third drop in win probability for the losing team.”

On the other hand, if the time-in has not yet been used during a game, it could be a potential threat to an opposing team, who might have to face the “Unexpected Hanging paradox”:

Imagine a prisoner is told that sometime during the next week he will be hanged, and it will be a complete surprise. The prisoner … reasons thus: I can’t be hanged on Friday, because it’s the final day of the week, and therefore not unexpected. So, I can only be hanged sometime between Monday and Thursday. However, it can’t be Thursday, because now that’s the last possible day to be hanged, and so it won’t be a surprise then either. Continuing this train of thought, the prisoner coolly deduces that he can’t be hanged any day of the week at all, and therefore will not die. He is therefore quite surprised when he is woken up early on Wednesday and sent to his death.

The author feels that the surprise element of a time-in would provide more excitement for fans.

He cites John Reed’s Clock Management[16] as “the authoritative ... book on time in football.”

The author is a professor of psychology at Berkeley and author of The Philosophical Baby[18].

She describes three recent experiments that show that “even the youngest children have sophisticated and powerful learning abilities” and “show how brilliant a baby’s mind really is.”

In 2008 researchers at the University of British Columbia “proved that babies could understand probabilities.” One-year-old babies were shown a box that contained red balls (20%) and white balls (80%). The babies appeared more surprised (“looked longer and more intently at the experimenter”) when an experimenter pulled a sample of mostly red balls (80%) from the box of mostly white balls.

The babies concluded that the researcher must like the red balls more than the white ones as when she held out her hand, they gave her a red ball rather than a white one. Far from being illogical and egocentric they could learn from statistics and use the logic of what they saw to figure out what someone else wanted.

In 2007 researchers at M.I.T. “demonstrated that when young children play, they are also exploring cause and effect.”

One group [of pre-schoolers] was shown that when you pressed one lever, the duck appeared and when you pressed the other, the puppet popped up. The second group observed that when you pressed both levers at once, both objects popped up, but they never got a chance to see what the levers did separately, which left mysterious the causal relation between the levers and the pop-up objects.

When the children were subsequently given the toys to play with, the latter group were more interested in playing with it and “just by playing around, they figured out how it worked.”

In 2007 the author, at Berkeley, “discovered that pre-schoolers can use probabilities to learn how things work.” The children were shown two blocks that could light up a machine, with a yellow one making the machine light up 2 out of 3 times and a blue block making it light up only 2 out of 6 times. When the children were given the blocks and asked to light up the machine, they were more likely to chose the yellow block.

These astonishing capacities for statistical reasoning, experimental discovery and probabilistic logic allow babies to rapidly learn all about the particular objects and people surrounding them.

The author also advocates letting babies experiment through play, developing their “capacity for statistical reasoning, experimental discovery and logic,” rather than enrolling them in programs and buying products that claim to “make their babies even smarter.”

In the experiment you mention with the red and white pinpong balls. It would be interesting to know if the results were the same when the colors were switched, or was it simply a question of the color red?

Poisson performance in sports

Two researchers at Canada’s Royal Military College were interested in whether an athletic team that scores the first goal has an improved chance of winning a game.

They have constructed a formula that gives a team's probability of winning, based on game time remaining after the first goal is scored. Their model includes: (a) a weighting to account for overtime; (b) the assumption, which is true for hockey and soccer, that the number of goals scored follows a Poisson distribution; (c) parameters for league position and seasonal performance.

It appears from the article that the first-goal team has an increasingly better chance of victory the longer it takes it to score that goal.

The researchers are less interested in predicting sports outcomes than they are in providing an example for statistics students of probability trees and several probability distributions.

Pay-to-play online auction

Swoopo[20] is a “seductive and controversial” website for online shoppers. For 60 cents per bid, potential customers are said to be able to win a wide range of retail products at unbelievably low prices.

Bidding starts at $0.00, can increase by as little as 1 cent, and the fee for each bid is $0.60. Also, to avoid users grabbing an item at the last minute, a few seconds are added to the clock with every new bid. Recently someone won a $1500 LG refrigerator for about $80; however, fees from all bidders totaled more than $2300 for that item.

Some criticisms of Swoopo are (a) it can be addictive, preying on people’s tendency to “overlook the small increments of money they spend to pursue alluring discounts”; (b) the odds are against the players, who number 2.5 registered users from the U.S., Britain, and Germany; (c) Swoopo’s profits are very high; (d) the bidding process lasts much longer than it appears due to seconds being added to the clock throughout the process.

A mathematician and former quantitative hedge fund analyst stated:

In aggregate, consumers trying to obtain these products are overpaying …. Unless you have an edge over other people who are bidding, and you can get them to subsidize your purchase, you shouldn’t do it. It’s a chump’s game.

Swoopo’s chief executive countered:

We are combining elements of online auctions, skill games and traditional e-commerce …. We are trying to bring back fun and excitement into shopping, which hasn’t been there in a long time.

One recent customer won a new refrigerator for less than $10, having spent more than $60 on bids. However, she reported that “she has lost far more auctions than she has won, and that there does not appear to be a way to gain a persistent edge over rival bidders.” Another customer admitted that he had “spent more money in a winning effort than the item itself would have cost,” and added:

You have to have some skill at it, or you are not going to go anywhere …. I wouldn’t call it gambling at all.

A lawyer hired by Swoopo stated:

Lotteries are games of chance, and an auction does not have what you would call any systematic chance, a random event that determines the winner.

Inferring causation

"Association is not causation" is a mantra in our introductory classes, to warn students about the pitfalls of lurking variables. Nevertheless, in many important policy discussions one needs to make judgments about causality when the only available data are observational. The present article is an essay about the use of instrumental variables in such situations.

One example discussed at length in the article concerns the effect of years in school on earnings later in life. A lurking variable is innate scholastic ability, which might by itself confer an advantage in earnings but also lead one to spend more time in school. The idea is to find a so-called instrumental variable, which affects earnings only through its influence on time in school. It turns out that birth date can be used in this way. Children born earlier in the year are older when they start school, and therefore are legally eligible to leave with less schooling completed. Since birth date is presumably not correlated with innate scholastic ability, it can be used to assess the effect on earnings of extra time in school, at least at the high school level. However, the article points out that this analysis does not extend to additional years of college education, where the time when one could legally leave school is not relevant. In fact, such limitations are cited as an important critique of the instrumental variable technique.

The Economist provides links to scholarly references from the article. One of the co-authors there, Joshua Angrist, has also co-authored a recent book entitled Mostly Harmless Econometrics which is intended to prepare practitioners to apply these methods. It has received numerous positive reviews, including this online from Andrew Gelman's blog. Also helpful is this advice from Gelman on "How to think about instrumental variables when you get confused."

Corporal punishment

This short clip refers to a joint ACLU-Human Rights Watch report, “A Violent Education: Corporal Punishment of Children in US Public Schools” [21].

According to the clip, “more than 200,000 U.S. schoolchildren were subjected to corporal punishment during the 2006-07 academic year, with disabled students receiving a disproportionate share.”

Below is a list of the states with the highest levels of corporal punishment, along with the total number of students punished in each state. Data are taken from the U.S. Department of Education’s Office for Civil Rights. Although not specified in the clip, the report itself indicates that the total number of incidents may have been greater than the total number of students because states “do not record occurrences where a student is hit multiple times in one year.”

Assessing hospital mortality rates

The BBC article appears to have been prompted by a story about 400 patients at Staffordshire General Hospital who may have died “unnecessarily.” The article’s author warns against jumping to negative conclusions about a hospital with a high mortality figure until careful statistical analysis is carried out. He draws some analogies to streaks in cards and lotteries.

He also refers interested readers to an (undated) Health Care Commission report, “Following up mortality ‘outliers’: A review of the programme for taking action where data suggest there may be serious concerns about the safety of patients” [22], which, he states, discusses “how to sift chance from real cause.” The report may be of interest to statistics students. It proposes a qualitative process for assessing mortality rates, and it has a detailed quantitative appendix describing “how to identify outliers using cumulative sum (CUSUM) methodology.”

[The CUSUM] detects significant deviations from expected outcomes. In our analysis, expected outcomes are either derived from standardised [sic] mortality rates or from the underlying risk attributed to each patient. If the plotting of data crosses a fixed ‘control limit’, then a significant run of poor outcomes is detected and an alert is signalled. Even if the underlying risk for each patient in a hospital is average, the observed rate over time will vary by chance. Therefore, limits have to be set to guard against too many ‘false alarms’ occurring as a result of random variation.