Math, Madison, food, the Orioles, books, my kids.

More on the end of history: what is a rational prediction?

It’s scrolled off the bottom of the page now, but there’s an amazing comment thread going on under my post on “The End of History Illusion,” the Science paper that got its feet caught in a subtle but critical statistical error.

Commenter Deinst has been especially good, digging into the paper’s dataset (kudos to the authors for making it public!) and finding further reasons to question its conclusions. In this comment, he makes the following observation: Quoidbach et al believe there’s a general trend to underestimate future changes in “favorites,” testing this by studying people’s predictions about their favorite movies, food, music, vacation, hobbies, and their best friends, averaging, and finding a slightly negative bias. What Deinst noticed is that the negative bias is almost entirely driven by people’s unwillingness to predict that they might change their best friend. On four of the six dimensions, respondents predicted more change than actually occurred. That sounds much more like “people assign positive moral value to loyalty to friends” than “people have a tendency across domains to underestimate change.”

But here I want to complicate a bit what I wrote in the post. Neither Quoidbach’s paper nor my post directly addresses the question: what do we mean by a “rational prediction?” Precisely: if there is an outcome which, given the knowledge I have, is a random variable Y, what do I do when asked to “predict” the value of Y? In my post I took the “rational” answer to be EY. But this is not the only option. You might think of a rational person as one who makes the prediction most likely to be correct, i.e. the modal value of Y. Or you might, as Deinst suggests, think that rational people “run a simulation,” taking a random draw from Y and reporting that as the prediction.

Now suppose people do that last thing, exactly on the nose. Say X is my level of extraversion now, Y is my level of extraversion in 10 years, and Z is my prediction for the value of Y. In the model described in the first post, the value of Z depends only on the value of X; if X=a, it is E(Y|X=a). But in the “run a simulation” model, the joint distribution of X and Z is exactly the same as the joint distribution of X and Y; in particular, E(|Z-X|) and E(|Y-X|) agree.

I hasten to emphasize that there’s no evidence Quoidbach et al. have this model of prediction in mind, but it would give some backing to the idea that, absent an “end of history bias,” you could imagine the absolute difference in their predictor condition matching the absolute difference in the reporter condition.

There’s some evidence that people actually do use small samples, or even just one sample, to predict variables with unknown distributions, and moreover that doing so can actually maximize utility, under some hypotheses on the cognitive cost of carrying out a more fully Bayesian estimate.

Does that mean I think Quoidbach’s inference is OK? Nope — unfortunately, it stays wrong.

It seems very doubtful that we can count on people hewing exactly to the one-sample model.

Example: suppose one in twenty people radically changes their level of extraversion in a 10-year interval. What happens if you ask people to predict whether they themselves are going to experience such a change in the next 10 years? Under the one-sample model, 5% of people would say “yes.” Is this what would actually happen? I don’t know. Is it rational? Certainly it fails to maximize the likelihood of being right. In a population of fully rational Bayesians, everyone would recognize shifts like this as events with probabiity less than 50%, and everyone would say “no” to this question. Quoidbach et al. would categorize this result as evidence for an “end of history illusion.” I would not.

Now we’re going to hear from my inner Andrew Gelman. (Don’t you have one? They’re great!) I think the real problem with Quoidbach et al’s analysis is that they think their job is to falsify the null hypothesis. This makes sense in a classical situation like a randomized clinical trial. Your null hypothesis is that the drug has no effect. And your operationalization of the null hypothesis — the thing you literally measure — is that the probability distribution on “outcome for patients who get the drug” is the same as the one on “outcome for patients who don’t get the drug.” That’s reasonable! If the drug isn’t doing anything, and if we did our job randomizing, it seems pretty safe to assume those distributions are the same.

What’s the null hypothesis in the “end of history” paper? It’s that people predict the extent of personality change in an unbiased way, neither underpredicting nor overpredicting it.

But the operationalization is that the absolute difference of predictions, |Z-X|, is drawn from the same distribution as the difference of actual outcomes, |Y-X|, or at least that these distributions have the same means. As we’ve seen, even without any “end of history illusion”, there’s no good reason for this version of the null hypothesis to be true. Indeed, we have pretty good reason to believe it’s not true. A rejection of this null hypothesis tells us nothing about whether there’s an end of history illusion. It’s not clear to me it tells you anything at all.

5 thoughts on “More on the end of history: what is a rational prediction?”

I have no need for an inner Andrew Gelman, but I do have an inner Don Rubin and an inner Jennifer Hill who help me when I get stuck on problems of statistics or causal inference. What I’d really like to have is an inner Xiao-Li Meng but that doesn’t seem to be available to me, unfortunately! I also have an inner Gary Becker but I just keep him around to argue with.

My understanding is that the meaning of “rational prediction” depends on what you want to use that prediction for. Among other things, it depends on what the costs are of being wrong in one direction or another. For example, the threshold at which you decide that somebody has a disease based on a test depends on the costs of a type I vs.a type II error.

Thank you for the Vul et al. reference. I was probably thinking more along the lines of this paper by Gilbert and Wilson on prospection and retrospection. Quoidbach has also written on this topic.

While I agree in general with your internal Andrew Gelman, I keep thinking that there is something I am missing. None of the authors are statistically naive, but the reporting of the statistics was sketchy at best (I would not have looked at the data if they had reported R^2 and the scalings of the response variables.) The authors do not seem to be prone to wild exaggeration, but after looking at the data that is what the abstract and conclusion seem to be. I feel like a deconstructionist faced with unknowable authorial intent, asserting my own view on the data :)

I will write up some more impressions of the data when I get time. I want to read some of the papers on prospection and retrospection first, as well as to give the authors some time to answer some questions on the data.

Very interesting: so Gilbert and Wilson have some prior interest in the question of “what is the right model for prediction,” endorsing some form of simulation model, but what’s interesting is that they say “But research shows that we tend to base our previews on those memories that are most available rather than the most typical and that, ironically enough, our most available memories are often of the least typical events.” So at any rate they certainly seem to feel that people are not carrying out a straight sample from a subjective distribution.

@Deinst
For several years, I have attended a biology seminar attended mostly by biologists (coming from a wide variety of areas in biology), with a sprinkling of computer scientists, often some paleontologists, occasionally a biochemist or two, me, and sometimes other mathematicians (and in the past, an astronomer and some linguists). Each week we read a paper (usually one published in a high quality journal) to discuss in the seminar. (There are some core attendees, and others who show up depending on the topic of the paper.) Very often there are some serious criticisms of the paper. Often only one or two people have noticed the problem while reading the paper, but after they explain, the criticisms are usually widely accepted by the other attendees (unless someone else points out a serious flaw in the criticism). But it’s not always the same one or two people – often it’s someone who has some specialized knowledge that is relevant. Based on this experience, I have adopted Hilary Clinton’s expression, “It takes a village,” to describe what is needed to adequately critique a scientific paper.

So, sure, there is a possibility that you have missed something in your reading of the Quoidbach et al paper. But things like reporting of statistics “being sketchy at best” are in my experience (regrettably) very common in published papers. It is often the case that looking closely at the supplemental material, the data, the captions of diagrams, etc. reveals questionable methods or conclusions. That does not imply that *you* are missing something – – it is likely that you are seeing something the *authors* missed. Indeed, the authors *are* asserting their own view of the data – that is common practice. And abstracts and conclusions are often, in my experience, exaggerated. This is part of the “replicability crisis.” So if a careful reading of the paper, data, and supplemental material causes you to doubt the authors’ conclusions, your view of the data may be more realistic than the authors’.