17 Answers
17

It's not a paradox per se, but it is a puzzling comment, at least at first.

During World War II, Abraham Wald was a statistician for the U.S. government. He looked at the bombers that returned from missions and analyzed the pattern of the bullet "wounds" on the planes. He recommended that the Navy reinforce areas where the planes had no damage.

Why? We have selection effects at work. This sample suggests that damage inflicted in the observed areas could be withstood. Either planes were never hit in the untouched areas, an unlikely proposition, or strikes to those parts were lethal. We care about the planes that went down, not just those that returned. Those that fell likely suffered an attack in a place that was untouched on those that survived.

Expanding upon a theme, according to this blog post, during World War I, the introduction of a tin helmet led to more head wounds than a standard cloth hat. Was the new helmet worse for soldiers? No; though injuries were higher, fatalities were lower.

Example
Suppose that we look for a relationship between voting and income by regressing the vote share for then-Senator Obama on the median income of a state (in thousands). We get an intercept of approximately 20 and a slope coefficient of 0.61.

Many would interpret this result as saying that higher income people are more likely to vote for Democrats; indeed, popular press books have made this argument.

But wait, I thought that rich people were more likely to be Republicans? They are.

What this regression is really telling us is that rich states are more likely to vote for a Democrat and poor states are more likely to vote for a Republican. Within a given state, rich people are more likely to vote Republican and poor people are more likely to vote Democrat. See the work of Andrew Gelman and his coauthors.

Without further assumptions, we cannot use group-level (aggregate) data to make inferences about individual-level behavior. This is the ecological fallacy. Group-level data can only tell us about group-level behavior.

To make the leap to individual-level inferences, we need the constancy assumption. Here, the voting choice of individuals most not vary systematically with the median income of a state; a person who earns \$X in a rich state must be just as likely to vote for a Democrat as someone who earns \$X in a poor state. But people in Connecticut, at all income levels, are more likely to vote for a Democrat than people in Mississippi at those same income levels. Hence, the consistency assumption is violated and we are led to the wrong conclusion (fooled by aggregation bias).

This topic was a frequent hobbyhorse of the late David Freedman; see this paper, for example. In that paper, Freedman provides a means for bounding individual-level probabilities using group data.

Comparison to Simpson's paradox
Elsewhere in this CW, @Michelle proposes Simpson's paradox as a good example, as it indeed is. Simpson's paradox and the ecological fallacy are closely related, yet distinct. The two examples differ in the natures of the data given and analysis used.

The standard formulation of Simpson's paradox is a two-way table. In our example here, suppose that we have individual data and we classify each individual as high or low income. We would get an income-by-vote 2x2 contingency table of the totals. We'd see that a higher share of high income people voted for the Democrat relative to the share of low income people. Were we to create a contingency table for each state, however, we'd see the opposite pattern.

In the ecological fallacy, we don't collapse income into a dichotomous (or perhaps multichotomous) variable. To get state-level, we get the mean (or median) state income and state vote share and run a regression and find that higher income states are more likely to vote for the Democrat. If we kept the individual-level data and ran the regression separately by state, we'd find the opposite effect.

In summary, the differences are:

Mode of analysis: We could say, following our SAT prep skills, that Simpson's paradox is to contingency tables as the ecological fallacy is to correlation coefficients and regression.

Degree of aggregation/nature of data: Whereas the Simpson's paradox example compares two numbers (Democrat vote share among high income individuals versus the same for low income individuals), ecological fallacy uses 50 data points (i.e., each state) to calculate a correlation coefficient. To get the full story from in the Simpson's paradox example, we'd just need the two numbers from each of the fifty states (100 numbers), while in the ecological fallacy case, we need the individual-level data (or else be given state-level correlations/regression slopes).

General observation
@NeilG comments that this just seems to be saying that you can't have any selection on unobservables/omitted variables bias issues in your regression. That's right! At least in the regression context, I think that nearly any "paradox" is just a special case of omitted variables bias.

Selection bias (see my other response on this CW) can be controlled for by including the variables that drive the selection. Of course, these variables are typically unobserved, driving the problem/paradox. Spurious regression (my other other response) can be overcome by adding a time trend. These cases say, essentially, that you have enough data, but need more predictors.

In the case of the ecological fallacy, it's true, you need more predictors (here, state-specific slopes and intercepts). But you need more observations, individual-, rather than group-level, observations as well to estimate these relationships.

(Incidentally, if you have extreme selection where the selection variable perfectly divides treatment and control, as in the WWII example that I give, you may need more data to estimate the regression as well; there, the downed planes.)

How is it possible to formalize the consistency assumption? It sounds like assuming that there are no (causal) confounders missing from one's model.
–
Neil GFeb 29 '12 at 1:26

1

Also, the example provided is also an example of Simpson's paradox because conditioning on the state reverses the correlation between income and party. When is the ecological fallacy different from Simpson's paradox?
–
Neil GFeb 29 '12 at 1:28

This is a recent invention; it was heavily discussed within a small set of philosophy journals over the last decade. There are staunch advocates for two very different answers (the "Halfers" and "Thirders"). It raises questions about the nature of belief, probability, and conditioning, and has caused people to invoke a quantum-mechanical "many worlds" interpretation (among other bizarre things).

Here is the statement from Wikipedia:

Sleeping Beauty volunteers to undergo the following experiment and is
told all of the following details. On Sunday she is put to sleep. A
fair coin is then tossed to determine which experimental procedure is
undertaken. If the coin comes up heads, Beauty is awakened and
interviewed on Monday, and then the experiment ends. If the coin comes
up tails, she is awakened and interviewed on Monday and Tuesday. But
when she is put to sleep again on Monday, she is given a dose of an
amnesia-inducing drug that ensures she cannot remember her previous
awakening. In this case, the experiment ends after she is interviewed
on Tuesday.

Any time Sleeping beauty is awakened and interviewed, she is asked,
"What is your credence now for the proposition that the coin landed
heads?"

The Thirder position is that S.B. should respond "1/3" (this is a simple Bayes' Theorem calculation) and the Halfer position is that she should say "1/2" (because that's the correct probability for a fair coin, obviously!). IMHO, the entire debate rests on a limited understanding of probability, but isn't that the whole point of exploring apparent paradoxes?

Although this is not the place to try to resolve paradoxes--only to state them--I don't want to leave people hanging and I'm sure most readers of this page don't want to wade through the philosophical explanations. We can take a tip from E. T. Jaynes, who replaces the question “how can we build a mathematical model of human common sense”—which is something we need in order to think through the Sleeping Beauty problem—by “How could we build a machine which would carry out useful plausible reasoning, following clearly defined principles expressing an idealized common sense?” Thus, if you like, replace S. B. by Jaynes' thinking robot. You can clone this robot (instead of administering a fanciful amnesiac drug) for the Tuesday portion of the experiment, thereby creating a clear model of the S. B. setup that can be unambiguously analyzed. Modeling this in a standard way using statistical decision theory then reveals there are really two questions being asked here (what is the chance a fair coin lands heads? and what is the chance the coin has landed heads, conditional on the fact that you were the clone who was awakened?). The answer is either 1/2 (in the first case) or 1/3 (in the second, using Bayes' Theorem). No quantum mechanical principles were involved in this solution :-).

Do you think it's equally effective to formulate the solution in terms of "base units"? By that I mean, you have to consider whether the base unit is the person, or the interview. 1/2 of persons will have had a head, but 1/3 of interviews will. Then to choose our base unit, we can revisit the question and phrase as "What is the chance that this interview is associated with a 'heads' result?"
–
JonathanFeb 29 '12 at 18:35

1

SB does not know how many interviews there have been and the question is about her assessment of the probability, not the experimenters' assessment. From her point of view, the number of interviews cannot be determined.
–
whuber♦Feb 29 '12 at 18:52

1

I think you should read the arguments in the literature first, Aaron. (I confess that I am a thirder, but I think the halfers will not find your reasoning convincing. At the very least, you need to show them why their argument is flawed.)
–
whuber♦Mar 8 '12 at 22:01

1

Fair point, @whuber, I've now had a further look at the literature. I'm reading Ellis's Sleeping Beauty: reply to Elga. It's this sentence that worries me, at the start of section '4. My argument'. "Only new relevant evidence, centred or uncentred, produces a change in credence". I'll think further and maybe blog about it again. I had a long discussion with seven other PhD students about this!
–
Aaron McDaidMar 10 '12 at 17:01

1

Is Sleeping Beauty allowed to look at the calendar when awakened? If Monday, then she ought to reply P(X=head)=0.5. If Tuesday, then P(X=head)=0.
–
RobertFOct 24 '12 at 15:33

There are no paradoxes in statistics, only puzzles waiting to be solved.

Nevertheless, my favourite is the two envelope "paradox". Suppose I put two envelopes in front of you and tell you that one contains twice as much money as the other (but not which is which). You reason as follows. Suppose the left envelope contains $x$, then with 50% probability the right envelope contains $2x$ and with 50% probability it contains $0.5x$, for an expected value of $1.25x$. But of course you can simply reverse the envelopes and conclude instead the left envelope contains $1.25$ times the value of the right envelope. What happened?

brilliant paradox - interestingly if we go with the "second" interpretation on wikipedia and try to calculate $E[B|A=a]$, we find that in order to prevent preference for switching we require $E[B|A=a]=a=2ap+\frac{a}{2}(1-p)$ where $p=Pr(A<B|A=a)$. Solving for $p$ means we get $p=\frac{1}{3}$. Similarly we can calculate $E[A|B=b]=b=2bq+\frac{b}{2}(1-q)$ where $q=Pr(B<A|B=b)$ and get $q=\frac{1}{3}$ ....Bizzare!
–
probabilityislogicMar 1 '12 at 13:05

4

I have given presentations on this paradox in which the game is actually played with the audience, with real amounts of money (usually a check to the host institution). It gets their attention...
–
whuber♦Mar 1 '12 at 15:50

Think I solved this one... The paradox is solved when we recognize the two envelope paradox incorrectly proposes 1) there are three possible quantities: 0.5x, x, and 2x, when there are only two quantities in the envelopes (say x and 2x), and 2) that we a priori know the left envelope contains x (in which case the right envelope would contain 2x with 100% certainty!). Given possible values of x and 2x randomly assigned to the two envelopes, the correct answer is an expected value of 1.5x whether I choose the left envelope or right envelope.
–
RobertFOct 25 '12 at 13:55

@RobertF The situation is more complicated. Suppose that it is known that the money is distributed in the two envelopes as follows. Toss a fair coin until it lands heads and count the number n of times the coin was tossed. Place 2^n dollars in one envelope and 2^(n+1) in the other. You can now perform very exact expectation computations and still retain the paradox.
–
Ittay WeissDec 24 '12 at 23:13

The Jeffreys-Lindley paradox, which shows that under some circumstances default frequentist and Bayesian methods of hypothesis testing can give completely contradictory answers. It really forces users to think about exactly what these forms of testing mean, and to consider whether that's what the really want. For a recent example see this discussion.

The St.Petersburg paradox, which makes you think differently on the concept and meaning of Expected Value. The intuition (mainly for people with background in statistics) and the calculations are giving different results.

Here is another that I like that seems so insufficiently known that it has no name attached to it, but has a similar flavor and an interesting statistical lesson: There exists a sequence of independent random variables $X_1,X_2,\ldots$ with mean zero and uniformly bounded variance such that $\sqrt{n} \bar X_n$ converges in distribution to a standard normal $\mathcal N(0,1)$ (just like the CLT). However, $\mathrm{Var}(\sqrt{n} \bar X_n) \to 17$ (or your favorite positive number).
–
cardinal♦Feb 28 '12 at 17:36

@cardinal Any chance you could post some details of this as a separate answer?
–
SilverfishJan 28 at 11:12

@Silver Let each $X_i$ have a Normal distribution with mean zero and variance $f(n)$. What would $f$ have to look like asymptotically for $\text{Var}(\sqrt{n}\bar X_n)$ to converge?
–
whuber♦Mar 23 at 22:13

@whuber Presumably I should read that as $X_i$ having variance $f(i)$; in which case (using independence of the $X_i$) we have $\mathrm{Var}(\sqrt{n}\bar X_n) = \frac{1}{n}\sum_{i=1}^n f(i)$ so we need the sequence $f(i)$ to be Cesàro summable if $\mathrm{Var}(\sqrt{n}\bar X_n)$ is to converge?
–
SilverfishMar 24 at 0:30

In a family with two children, what are the chances, if one of the children is a girl, that both children are girls?

Most people intuitively say 1/2, but the answer is 1/3. The issue, fundamentally, is that uniformly choosing "one girl, from all girls with one sibling" at random is not the same as uniformly choosing "one family, from all families with two children and at least one girl."

This one is simple enough to mesh with intuition, once you understand it, but there are more complicated versions that are more difficult to comprehend:

In a family with two children, what are the chances, if one of the children is a boy born on Tuesday, that both children are boys? (Answer: 13/27)

In a family with two children, what are the chances, if one of the children is a girl named Florida, that both children are girls? (Answer: very close to 1/2, assuming "Florida" is an extremely rare name)

The answer is 1/3 not 2/3 surely? Only one out of GB, BG, GG
–
Martin SmithFeb 29 '12 at 13:29

2

The "boy born on Tuesday" article is good. Its main point, which is made very clearly ("the problem is under-defined"), is that the answer depends on the probability model one adopts. Saying that "the" answer is 13/27 is misleading (at best).
–
whuber♦Feb 29 '12 at 16:01

The reason these problems are so confusing is that the question is worded so that its very difficult to ascertain what hypothesis space is. This in turn makes it confusing as to what the "equally likely" cases actually are (and hence what should be counted).
–
probabilityislogicMar 1 '12 at 11:13

1

I feel like being cheeky, and note that the way the question is worded really only indicates that children are exchangeable in terms of their order - knowing that the child is a girl doesn't tell us whether or not they were the first or second child. What this means is that $p(B_1G_2)=p(G_1B_2)$. But nothing else! So all we can really say is that the probability of another girl is that it is equal to $\frac{p(G_1G_2)}{2p(B_1G_2)+p(G_1G_2)}$. To get a numerical value requires us to assign probabilities, which cannot be done with the information given.
–
probabilityislogicMar 1 '12 at 11:24

Again, perhaps not a paradox per se and another example of omitted variables bias.

Spurious causation/regression
Any variable with a time trend is going to be correlated with another variable that also has a time trend. For example, my weight from birth to age 27 is going to be highly correlated with your weight from birth to age 27. Obviously, my weight isn't caused by your weight. If it was, I'd ask that you go to the gym more frequently, please.

Then the regression
$$\begin{equation*}y_t = \gamma_0 + \gamma_1 x_t + \nu_t\end{equation*}$$
has an omitted variable---the time trend---that is correlated with the included variable, $x_t$. Hence, the coefficient $\gamma_1$ will be biased (in this case, it will be positive, as our weights grow over time).

When you are performing time series analysis, you need to be sure that your variables are stationary or you'll get these spurious causation results.

One of my favorites is the Monty Hall problem. I remember learning about it in an elementary stats class, telling my dad, as both of us were in disbelief I simulated random numbers and we tried the problem. To our amazement it was true.

Basically the problem states that if you had three doors on a game show, behind which one is a prize and the other two nothing, if you chose a door and then were told of the remaining two doors one of the two was not a prize door and allowed to switch your choice if you so chose you should switch you current door to the remaining door.

It's interesting that the Two Child Problem and the Monty Hall Problem so often get mentioned together in the context of paradox. Both illustrate an apparent paradox first illustrated in 1889, called Bertrand's Box Paradox, which can be generalized to represent either. I find it a most interesting "paradox" because the same very-educated, very-intelligent people answer those two problems in opposite ways with respect to this paradox. It also compares to a principle used in card games like bridge, known as the Principle of Restricted Choice, where it resolution is time-tested.

Say you have a randomly selected item that I'll call a "box." Every possible box has at least one of two symmetric properties, but some have both. I'll call the properties "gold" and "silver." The probability that a box is just gold is P; and since the properties are symmetric, P is also the probability that a box is just silver. That makes the probability that a box has just one property 2P, and the probability that it has both 1-2P.

If you are told a box is gold, but not whether it is silver, you might be tempted to say the chances it is just gold are P/(P+(1-2P))=P/(1-P). But then you would have to state the same probability for a one-color box if you were told it was silver. And if this probability is P/(1-P) whenever you are told just one color, it has to be P/(1-P) even if you aren't told a color. Yet we know it is 2P from the last paragraph.

This apparent paradox is resolved by noting that if a box has only one color, there is no ambiguity about what color you will be told. But if it has two, there is an implied choice. You have to know how that choice was made in order to answer the question, and that is the root of the apparent paradox. If you aren't told, you can only assume a color was chosen at random, making the answer P/(P+(1-2P)/2)=2P. If you insist P/(1-P) is the answer, you are implicitly assuming there was no possibility the other color could have been mentioned unless it was the only color.

In the Monty Hall Problem, the analogy for the colors is not very intuitive, but P=1/3. Answers based on the two unopened doors originally being equally likely to have the prize are assuming Monty Hall was required to open the door he did, even if he had a choice. That answer is P/(1-P)=1/2. The answer allowing him to choose at random is 2P=2/3 for the probability that switching will win.

In the Two Child Problem, the colors in my analogy compare quite nicely to genders. With four cases, P=1/4. To answer the question, we need to know how it was determined that there was a girl in the family. If it was possible to learn about a boy in the family by that method, then the answer is 2P=1/2, not P/(1-P)=1/3. It's a little more complicated if you consider the name Florida, or "born on Tuesday," but the results are the same. The answer is exactly 1/2 if there was a choice, and most statements of the problem imply such a choice. And the reason "changing" from 1/3 to 13/27, or from 1/3 to "nearly 1/2," seems paradoxical and unintuitive, is because the assumption of no choice is unintuitive.

In the Principle of Restricted Choice, say you are missing some set of equivalent cards - like the Jack, Queen, and King of the same suit. The chances start out even that any particular card belongs to a specific opponent. But after an opponent plays one, his chances of having any one of the others are decreased because he could have played that card if he had it.

I don't follow your probabilities. If by "symmetric", you mean $P_G=P_S$ (which I think you mean), then shouldn't the probability of both be $P^2$, rather than $2P$? (This assumes independence, which I think you mean, although it would help to state that explicitly.) Furthermore, I think the probability of the box being neither should be $(1-P)^2$, rather than $1-2P$, shouldn't it? These can easily be seen if we consider the case where $P_G=P_S=.8$--then $P_{GS}=1.6$ & $P_{-G-S}=-.6$, unless by "symmetric" you mean that $P=.5$ & the properties are perfectly dependent. Sorry to nitpick.
–
gungMar 7 '12 at 14:57

Sorry, maybe I didn't explain it well trying to be as brief as possible. My P was not the probability a box has the color gold, it was the probability it was only gold. The probability it has the color gold is 1-P. And while the two properties are symmertic, they do not have to be independent, so you can't just multiply probabilities. Also, no box is "neither." Bertrand used three box with two coins in each: gold+gold, gold+silver, and silver+silver. A box with any number of gold coins is "gold" in my generalization.
–
JeffJoMar 7 '12 at 16:10

+1, that helps. I now see the phrase "at least one of two" and the word "just", which I must have skimmed over.
–
gungMar 7 '12 at 17:25

From wikipdedia: "Parrondo's paradox, a paradox in game theory, has been described as: A combination of losing strategies becomes a winning strategy. It is named after its creator, Juan Parrondo, who discovered the paradox in 1996. A more explanatory description is:

There exist pairs of games, each with a higher probability of losing
than winning, for which it is possible to construct a winning strategy
by playing the games alternately.

Parrondo devised the paradox in connection with his analysis of the Brownian ratchet, a thought experiment about a machine that can purportedly extract energy from random heat motions popularized by physicist Richard Feynman. However, the paradox disappears when rigorously analyzed."

As alluring as the paradox might sound to the financial crowd, it does have requirements that are not readily available in financial time series. Even though a few of the component strategies can be losing, the offsetting strategies require unequal and stable probabilities of much greater or less than 50% in order for the ratcheting effect to kick in.
It would be difficult to find financial strategies, whereby one has $P_B(W)=3/4+\epsilon$ and the other, $P_A(W)=1/10 + \epsilon$, over long periods.

There's also a more recent related paradox called the "allison mixture," that shows we can take two IID and non-correlated series, and randomly scramble them such that certain mixtures can create a resulting series with non-zero autocorrelation.

I like the following: The host is using an unknown distribution on $[0,1]$ to choose, independently, two numbers $x,y\in [0,1]$. The only thing known to the player about the distribution is that $P(x=y)=0$. The player is then shown the number $x$ and is asked to guess whether $y>x$ or $y<x$. Clearly, if player always guesses $y>x$ then player will be correct with probability $0.5$. However, at least surprisingly if not paradoxically, player can improve on that strategy. I'm afraid I don't have a link to the problem (I heard it many years ago during a workshop).

Dear Ittay, I believe Tom Cover is the original source of this problem. I think it is also listed in his Open Problems in Communication and Computation, but I don't have it handy to check. It's a nice problem. The restriction to $[0,1]$, or, even a random $y$ (or $x$, for that matter) is inessential. Cheers.
–
cardinal♦Mar 2 '13 at 19:50

We're looking for long answers that provide some explanation and context. Don't just give a one-line answer; explain why your answer is right, ideally with citations. Answers that don't include explanations may be removed.

I find a simplified graphical illustration of the ecological fallacy (here the rich State/poor State voting paradox) helps me to understand on an intuitive level why we see a reversal of voting patterns when we aggregate State populations:

@Nick: this particular example is actually distinct from Simpson's Paradox, but it can be hard to know which fallacy/paradox applies in a particular situation because they look the same statistically. The difference is that SP is a "false effect" that appears only when analyzing subgroups. This trend shown is though to be a "true effect" that appears only when analyzing subgroups. In this case, it suggests that while income as a raw number doesn't affect voting patterns in aggregate, income as related to your neighbors (your state) does influence voting patterns.
–
JonathanJan 15 '13 at 20:56

@Charlie 'below' and 'above' are functions of whatever way a reader of the page is sorting (active/oldest/votes), and in any case the order under some of the sorting criteria can change over time (including the default). As such, it's probably better to mention the person that posted the discussion you refer to, or even link to it.
–
Glen_bJun 20 '13 at 8:13

Suppose you obtained a data on births in royal family of some kingdom.
In the family tree each birth was noted. What is peculiar about this
family was that parents were trying to have a baby only as soon first
boy was born and then did not have any more children.

So your data potentially looks similar to this:

G G B
B
G G B
G B
G G G G G G G G G B
etc.

Will the proportion of boys and girls in this sample reflect the general probability of giving a birth to a boy (say 0.5)? The answer and explanation can be found in this thread.

This answer reads like a puzzle, not like a paradox. I can imagine why you wanted to post it like that, but I think for this answer to qualify as paradox and to fit this thread, you need to be more explicit.
–
amoebaMar 23 at 21:43

1

This question (with boys and girls interchanged) was asked at stats.stackexchange.com/questions/93830, which received a large number of answers--not entirely in agreement! (I learned something by taking the problem seriously and thinking about it in increasingly realistic ways, exploring the assumptions needed to do that.)
–
whuber♦Mar 23 at 21:52

@whuber thanks for the link! I added it into the description.
–
TimMar 23 at 21:56

I'd have been surprised if this weren't usually the case.
–
Glen_bJun 20 '13 at 8:10

It's unclear what $x/z$ and "correlated" are intended to mean. (Presumably "$x/z$" is componentwise division--assuming no components of $z$ are zero!) Is "correlated" to be interpreted in the sense of the correlation coefficient (essentially the standardized dot products) or are we to treat $X,Y,$ and $Z$ as random variables and consider their correlation coefficients in that sense?
–
whuber♦Mar 23 at 21:57