Many people will remember that the infamous Monty Hall problem first gained national attention after appearing in an "Ask Marilyn" [http://www.marilynvossavant.com/articles/gameshow.html column in 1990]. (More nostalgia: In the [http://www.dartmouth.edu/~chance/chance_news/recent_news/chance_news_1.01.html inaugural issue of Chance News], Laurie Snell described a [http://www.condenaststore.com/-sp/In-your-case-Dave-there-s-a-choice-elective-surgery-outpatient-medicin-Prints_i8542829_.htm New Yorker cartoon] inspired by Monty's game show, Let's Make a Deal). One important lesson from that discussion was that the host's behavior mattered, and that the problem was not well-defined without a model for how he chooses a door to open.

Many people will remember that the infamous Monty Hall problem first gained national attention after appearing in an "Ask Marilyn" [http://www.marilynvossavant.com/articles/gameshow.html column in 1990]. (More nostalgia: In the [http://www.dartmouth.edu/~chance/chance_news/recent_news/chance_news_1.01.html inaugural issue of Chance News], Laurie Snell described a [http://www.condenaststore.com/-sp/In-your-case-Dave-there-s-a-choice-elective-surgery-outpatient-medicin-Prints_i8542829_.htm New Yorker cartoon] inspired by Monty's game show, Let's Make a Deal). One important lesson from that discussion was that the host's behavior mattered, and that the problem was not well-defined without a model for how he chooses a door to open.

−

In that spirit, it might be relevant to consider how the other string of "rolls" was chosen. Marilyn's answer suggests that she was already planning to write down a string of twenty 1s along with her string of twenty actual rolls. If you know that is going to happen, then even before the roll has occurred, you might be prepared to guess that the real string is not the one consisting of twenty 1s.

+

In that spirit, it might be relevant to consider how the other string of "rolls" was chosen. Marilyn's answer suggests that she was already planning to write down a string of twenty 1s along with her string of twenty actual rolls. If you know that is going to happen, then even before the roll has occurred, you might be prepared to guess in advance that the real string will not be the one consisting of twenty 1s.

'''Discussion'''<br>

'''Discussion'''<br>

−

#What do you think Marilyn had in mind when she wrote "because the roll has already occurred..."?

+

#The other lesson from the Monty Hall discussion was not to jump to the conclusion that Marilyn is wrong. So what do you think she had in mind when she wrote "because the roll has already occurred..."?

#I've just rolled a die twenty times. Which of the following do you think it is: (i) 14152653532346264333; or (ii) 61655214235336553132? Does your answer change if someone points out that (i) consists of the digits of pi after the decimal point, skipping the 0s, 7s, 8, and 9s?

#I've just rolled a die twenty times. Which of the following do you think it is: (i) 14152653532346264333; or (ii) 61655214235336553132? Does your answer change if someone points out that (i) consists of the digits of pi after the decimal point, skipping the 0s, 7s, 8, and 9s?

"Experts have a poor understanding of uncertainty. Usually, this manifests itself in the form of overconfidence: experts underestimate the likelihood that their predictions might be wrong. …. [E]xperts who use terms like “never” and “certain” too often are playing Russian roulette with their reputations."

"I used to be annoyed when the margin of error was high in a forecasting model that I might put together. Now I view it as perhaps the single most important piece of information that a forecaster provides. When we publish a forecast on FiveThirtyEight, I go to great lengths to document the uncertainty attached to it, even if the uncertainty is sufficiently large that the forecast won’t make for punchy headlines."

"Another fundamental error: when you have such little data, you should almost never throw any of it out, and you should be especially wary of doing so when it happens to contradict your hypothesis."

Fraud may just be the tip of the iceberg

A recently revealed case about fraud may point to a much larger problem.

A well-known psychologist in the Netherlands whose work has been published widely in professional journals falsified data and made up entire experiments, an investigating committee has found. Experts say the case exposes deep flaws in the way science is done in a field, psychology, that has only recently earned a fragile respectability.

The psychologist accused of fraud took advantage of some common practices in the field.

Dr. Stapel was able to operate for so long, the committee said, in large measure because he was “lord of the data,” the only person who saw the experimental evidence that had been gathered (or fabricated). This is a widespread problem in psychology, said Jelte M. Wicherts, a psychologist at the University of Amsterdam. In a recent survey, two-thirds of Dutch research psychologists said they did not make their raw data available for other researchers to see. “This is in violation of ethical rules established in the field,” Dr. Wicherts said.

The field also appears to be rather careless about their statistical analyses.

In an analysis published this year, Dr. Wicherts and Marjan Bakker, also at the University of Amsterdam, searched a random sample of 281 psychology papers for statistical errors. They found that about half of the papers in high-end journals contained some statistical error, and that about 15 percent of all papers had at least one error that changed a reported finding — almost always in opposition to the authors’ hypothesis.

This is not a surprise to psychologists.

Researchers in psychology are certainly aware of the issue. In recent years, some have mocked studies showing correlations between activity on brain images and personality measures as “voodoo” science, and a controversy over statistics erupted in January after The Journal of Personality and Social Psychology accepted a paper purporting to show evidence of extrasensory perception. In cases like these, the authors being challenged are often reluctant to share their raw data. But an analysis of 49 studies appearing Wednesday in the journal PLoS One, by Dr. Wicherts, Dr. Bakker and Dylan Molenaar [available here], found that the more reluctant that scientists were to share their data, the more likely that evidence contradicted their reported findings.

Submitted by Steve Simon

Remark

Andrew Gelman's blog has often considered questions of cheating in science. The following quote from E. J. Wagenmakers, a Dutch professor at Amsterdam University, appeared in a post from September 9 of this year :

Diederik Stapel was not just a productive researcher, but he also made appearances on Dutch TV shows. The scandal is all over the Dutch news. Oh, one of the courses he taught was on something like 'Ethical behavior in research', and one of his papers is about how power corrupts. It doesn’t get much more ironic than this. I should stress that the extent of the fraud is still unclear.

This is perhaps doubly ironic, in that the psychologists have been caught making psychological errors.

Submitted by Paul Alper

Another Remark

"Much of Prof. Stapel's work made it into newspapers in no small part because he delivered scientific evidence for contentions journalists wanted to believe …..”

Marilyn tackles a dice problem

It has been a while since we've reported on an "Ask Marilyn" story. In the Sunday column referenced above, a reader asks:

I’m a math instructor and I think you’re wrong about this question [originally from Marilyn's July 23 column]: “Say you plan to roll a die 20 times. Which result is more likely: (a) 11111111111111111111; or (b) 66234441536125563152?” You said they’re equally likely because both specify the number for each of the 20 tosses. I agree so far. However, you added, “But let’s say you rolled a die out of my view and then said the results were one of those series. Which is more likely? It’s (b) because the roll has already occurred. It was far more likely to have been that mix than a series of ones.” I disagree. Each of the results is equally likely—or unlikely. This is true even if you are not looking at the result.

Marilyn responds:
"My answer was correct. To convince doubting readers, I have, in fact, rolled a die 20 times and noted the result, digit by digit. It was either: (a) 11111111111111111111; or (b) 63335643331622221214. Do you still believe that the two series are equally likely to be what I rolled?"

Many people will remember that the infamous Monty Hall problem first gained national attention after appearing in an "Ask Marilyn" column in 1990. (More nostalgia: In the inaugural issue of Chance News, Laurie Snell described a New Yorker cartoon inspired by Monty's game show, Let's Make a Deal). One important lesson from that discussion was that the host's behavior mattered, and that the problem was not well-defined without a model for how he chooses a door to open.

In that spirit, it might be relevant to consider how the other string of "rolls" was chosen. Marilyn's answer suggests that she was already planning to write down a string of twenty 1s along with her string of twenty actual rolls. If you know that is going to happen, then even before the roll has occurred, you might be prepared to guess in advance that the real string will not be the one consisting of twenty 1s.

Discussion

The other lesson from the Monty Hall discussion was not to jump to the conclusion that Marilyn is wrong. So what do you think she had in mind when she wrote "because the roll has already occurred..."?

I've just rolled a die twenty times. Which of the following do you think it is: (i) 14152653532346264333; or (ii) 61655214235336553132? Does your answer change if someone points out that (i) consists of the digits of pi after the decimal point, skipping the 0s, 7s, 8, and 9s?

Submitted by Bill Peterson

Comment

Paul Alper wrote to point out an analogy with a famous classroom experiment, in which the instructor leaves the room while students compile lists of 200 "tosses" of a fair coin. Half the students toss a real coin, while the other half produce a string of imagined tosses. Upon return, the teacher classifies the strings as real or fake, depending on the length of the longest run. The imagined strings typically will typically not include long runs, but with probability 0.965 a real string of 200 tosses will contain a run of at least six consecutive heads or six consecutive tails (see discussion in archives of the Chance Newsletter. This activity is also described in the chapter "Streaky Behavior" in Scheaffer, et. al, Activity Based Statistics).

A recent story on the site gives the following overall summary of performance of the news media in providing accurate coverage. Schwitzer writes
"After 5 years and 7 months, and after reviewing 1,648 stories and publishing nearly 1,300 blog posts, we've revised the site (for the second time)."
Below is how these 1,648 stories fared on his rating system:

The stories rated above come from 20 news organizations, including newspapers, magazines and web sources. When it comes to TV presentations of medical results, however, HealthNewsReviews.org has thrown in the towel and won't be reviewing them because, "After 3.5 years and 228 network TV health segments reviewed, we can make the data-driven statement that many of the stories are bad and they’re not getting much better."

Submitted by Paul Alper

The goal of reproducibility

The WSJ says "This is one of medicine's dirty secrets: Most results, including those that appear in top-flight peer-reviewed journals, can't be reproduced." The article includes the following graphic summarizing the (largely unsuccessful) attempts by Bayer to reproduce published findings.

The article goes on to discuss various reasons for this state of affairs, pressure on researchers to to publish, the increasing complexity of medical experiments, and the well-known bias of journals for publishing only positive results. Some of these issues were discussed in CN 5, which John Ionnidis's 2005 article in PLoS Medicine Why most published research findings are false.

The more reliable popular media have finally been convinced to carry pretty accurate statements about interpreting confidence intervals in polling results. Maybe the media - and even science journals themselves - need to be encouraged to carry a cigarette-like warning about study results: "Caution: Since science is an inductive process whose conclusions depend upon strong evidence that is reproducible, readers should take into account that any conclusions are preliminary, and should not be acted upon until further experiments have reinforced them."

Submitted by Margaret Cibes

QL in the Media Contest finalists

In CN 77, Margaret Cibes noted that the MAA SIGMAA on Quantitative Literacy was running a contest for best and worst examples of QL in the media. They have posted the entries here, where viewers are invited to cast their votes.