"Experts have a poor understanding of uncertainty. Usually, this manifests itself in the form of overconfidence: experts underestimate the likelihood that their predictions might be wrong. …. [E]xperts who use terms like “never” and “certain” too often are playing Russian roulette with their reputations."

"I used to be annoyed when the margin of error was high in a forecasting model that I might put together. Now I view it as perhaps the single most important piece of information that a forecaster provides. When we publish a forecast on FiveThirtyEight, I go to great lengths to document the uncertainty attached to it, even if the uncertainty is sufficiently large that the forecast won’t make for punchy headlines."

"Another fundamental error: when you have such little data, you should almost never throw any of it out, and you should be especially wary of doing so when it happens to contradict your hypothesis."

In Reframing the debate over using phones behind the wheel (New York Times, 17 December 2011), we read,
"Part of the lure of smartphones...is that they randomly dispense valuable information. People do not know when an urgent or interesting e-mail or text will come in, so they feel compelled to check all the time." The following sidebar appears in the online version of the article:

Fraud may just be the tip of the iceberg

A recently revealed case about fraud may point to a much larger problem.

A well-known psychologist in the Netherlands whose work has been published widely in professional journals falsified data and made up entire experiments, an investigating committee has found. Experts say the case exposes deep flaws in the way science is done in a field, psychology, that has only recently earned a fragile respectability.

The psychologist accused of fraud took advantage of some common practices in the field.

Dr. Stapel was able to operate for so long, the committee said, in large measure because he was “lord of the data,” the only person who saw the experimental evidence that had been gathered (or fabricated). This is a widespread problem in psychology, said Jelte M. Wicherts, a psychologist at the University of Amsterdam. In a recent survey, two-thirds of Dutch research psychologists said they did not make their raw data available for other researchers to see. “This is in violation of ethical rules established in the field,” Dr. Wicherts said.

The field also appears to be rather careless about their statistical analyses.

In an analysis published this year, Dr. Wicherts and Marjan Bakker, also at the University of Amsterdam, searched a random sample of 281 psychology papers for statistical errors. They found that about half of the papers in high-end journals contained some statistical error, and that about 15 percent of all papers had at least one error that changed a reported finding — almost always in opposition to the authors’ hypothesis.

This is not a surprise to psychologists.

Researchers in psychology are certainly aware of the issue. In recent years, some have mocked studies showing correlations between activity on brain images and personality measures as “voodoo” science, and a controversy over statistics erupted in January after The Journal of Personality and Social Psychology accepted a paper purporting to show evidence of extrasensory perception. In cases like these, the authors being challenged are often reluctant to share their raw data. But an analysis of 49 studies appearing Wednesday in the journal PLoS One, by Dr. Wicherts, Dr. Bakker and Dylan Molenaar [available here], found that the more reluctant that scientists were to share their data, the more likely that evidence contradicted their reported findings.

Submitted by Steve Simon

Remark

Andrew Gelman's blog has often considered questions of cheating in science. The following quote from E. J. Wagenmakers, a Dutch professor at Amsterdam University, appeared in a post from September 9 of this year :

Diederik Stapel was not just a productive researcher, but he also made appearances on Dutch TV shows. The scandal is all over the Dutch news. Oh, one of the courses he taught was on something like 'Ethical behavior in research', and one of his papers is about how power corrupts. It doesn’t get much more ironic than this. I should stress that the extent of the fraud is still unclear.

This is perhaps doubly ironic, in that the psychologists have been caught making psychological errors.

Submitted by Paul Alper

Another Remark

"Much of Prof. Stapel's work made it into newspapers in no small part because he delivered scientific evidence for contentions journalists wanted to believe …..”

Marilyn tackles a dice problem

It has been a while since we've reported on an "Ask Marilyn" story. In the Sunday column referenced above, a reader asks:

I’m a math instructor and I think you’re wrong about this question [originally from Marilyn's July 23 column]: “Say you plan to roll a die 20 times. Which result is more likely: (a) 11111111111111111111; or (b) 66234441536125563152?” You said they’re equally likely because both specify the number for each of the 20 tosses. I agree so far. However, you added, “But let’s say you rolled a die out of my view and then said the results were one of those series. Which is more likely? It’s (b) because the roll has already occurred. It was far more likely to have been that mix than a series of ones.” I disagree. Each of the results is equally likely—or unlikely. This is true even if you are not looking at the result.

Marilyn responds:
"My answer was correct. To convince doubting readers, I have, in fact, rolled a die 20 times and noted the result, digit by digit. It was either: (a) 11111111111111111111; or (b) 63335643331622221214. Do you still believe that the two series are equally likely to be what I rolled?"

Many people will remember that the infamous Monty Hall problem first gained national attention after appearing in an "Ask Marilyn" column in 1990. (More nostalgia: In the inaugural issue of Chance News, Laurie Snell described a New Yorker cartoon inspired by Monty's game show, Let's Make a Deal). One important lesson from that discussion was that the host's behavior mattered, and that the problem was not well-defined without a model for how he chooses a door to open.

In that spirit, it might be relevant to consider how the strings of "rolls" are produced. Marilyn's answer suggests that she was already planning to write down a string of twenty 1s along with her string of twenty actual rolls. If you know that is going to happen, then even before the roll has occurred, you might be prepared to guess in advance that the real string will not be the one consisting of twenty 1s.

Discussion

The other lesson from the Monty Hall discussion was not to jump to the conclusion that Marilyn is wrong. So what do you think she had in mind when she wrote "because the roll has already occurred..."?

I've just rolled a die twenty times. Which of the following do you think it is: (i) 14152653532346264333; or (ii) 61655214235336553132? Does your answer change if someone points out that (i) consists of the digits of pi after the decimal point, skipping the 0s, 7s, 8, and 9s?

Submitted by Bill Peterson

Comment

Paul Alper wrote to point out an analogy with a famous classroom experiment, in which the instructor leaves the room while students compile lists of 200 "tosses" of a fair coin. Half the students toss a real coin, while the other half produce a string of imagined tosses. Upon return, the teacher classifies the strings as real or fake, depending on the length of the longest run. The imagined strings typically will typically not include long runs, but with probability 0.965 a real string of 200 tosses will contain a run of at least six consecutive heads or six consecutive tails (see discussion in archives of the Chance Newsletter. This activity is also described in the chapter "Streaky Behavior" in Scheaffer, et. al, Activity Based Statistics).

A recent story on the site gives the following overall summary of performance of the news media in providing accurate coverage. Schwitzer writes
"After 5 years and 7 months, and after reviewing 1,648 stories and publishing nearly 1,300 blog posts, we've revised the site (for the second time)."
Below is how these 1,648 stories fared on his rating system:

The stories rated above come from 20 news organizations, including newspapers, magazines and web sources. When it comes to TV presentations of medical results, however, HealthNewsReviews.org has thrown in the towel and won't be reviewing them because, "After 3.5 years and 228 network TV health segments reviewed, we can make the data-driven statement that many of the stories are bad and they’re not getting much better."

Submitted by Paul Alper

The goal of reproducibility

The WSJ says "This is one of medicine's dirty secrets: Most results, including those that appear in top-flight peer-reviewed journals, can't be reproduced." The article includes the following graphic summarizing the (largely unsuccessful) attempts by Bayer to reproduce published findings.

The article goes on to discuss various reasons for this state of affairs, pressure on researchers to to publish, the increasing complexity of medical experiments, and the well-known bias of journals for publishing only positive results. Some of these issues were discussed in CN 5, which John Ionnidis's 2005 article in PLoS Medicine Why most published research findings are false.

The more reliable popular media have finally been convinced to carry pretty accurate statements about interpreting confidence intervals in polling results. Maybe the media - and even science journals themselves - need to be encouraged to carry a cigarette-like warning about study results: "Caution: Since science is an inductive process whose conclusions depend upon strong evidence that is reproducible, readers should take into account that any conclusions are preliminary, and should not be acted upon until further experiments have reinforced them."

Submitted by Margaret Cibes

QL in the Media Contest finalists

In CN 77, Margaret Cibes noted that the MAA SIGMAA on Quantitative Literacy was running a contest for best and worst examples of QL in the media. They have posted the entries here, where viewers are invited to cast their votes.

Probabilist for president?

A New Hampshire mathematician and Republican presidential primary candidate stated:

”I will accept any top-tier candidate's neutrally administered aptitude challenge that assesses the mental, physical and ethical qualities of leadership …. No other candidate comes close to my structured problem-solving abilities and demonstrated proficiency in probabilistic risk assessment." ….
“[I will] balance the federal budget through a mathematically superior tax platform that combines personal income, flat taxes, progressive taxes and capital gains into one elegant solution that no other candidate has formulated or is capable of generating.”

Submitted by Margaret Cibes

Queues

This article discusses queuing issues and several recent studies about them. It includes a 6-minute video[3] about wait time perceptions and their effects on customers, and a graphic[4] showing a formula for expected wait time: “Average wait time = average number of people in line divided by their arrival rate.”

Submitted by Margaret Cibes

Population pyramids

Chance readers may be interested in population pyramids. They are nice examples of distributions, as well as opportunities for comparisons of age and gender characteristics within one country, or in the same country over different years, or across different countries. For the U.S., they illustrate clearly what Atul Gwande calls “the ‘rectangularization’ of survival”:

Throughout most of human history, a society's population formed a sort of pyramid: young children represented the largest portion – the base – and each successively older cohort represented a smaller and smaller group. In 1950, children under the age of five were eleven per cent of the U.S. population, adults aged forty-five to forty-nine were six per cent, and those over eighty were one percent. Today [2007], we have as many fifty-year olds as five-year-olds. In thirty years, there will be as many people over eighty as there are under five.

==

Book review of Models Behaving Badly, in The Wall Street Journal, December 14, 2011

Emanuel Derman, author of Models Behaving Badly, is a Columbia professor who was trained as a physicist and later worked at Goldman Sachs. He writes that when people try to create financial models that involve human behavior, they “are trying to force the ugly stepsister's foot into Cinderella's pretty glass slipper”:

Although financial models employ the mathematics and style of physics, they are fundamentally different from the models that science produces. Physical models can provide an accurate description of reality. Financial models, despite their mathematical sophistication, can at best provide a vast oversimplification of reality.

Derman has a "Financial Modeler's Manifesto,” co-authored by Paul Wilmott. Some points are:
"I will always look over my shoulder and never forget that the model is not the world."
"I will not be overly impressed with mathematics."
"I will never sacrifice reality for elegance."
"I will not give the people who use my models false comfort about their accuracy."

He also states, “[I]n physics you're playing against God, and He doesn't change His laws very often. In finance, you're playing against God's creatures."