Today I want to tell you about the single most counterintuitive-but-true thing I’ve ever heard.

Suppose you’re observing something that changes over time — say the Dow Jones average, or the temperature in Barrow, Alaska, or the number of people who have been shot by terrorists so far this year. Suppose you have absolutely no prior information about how this thing behaves — in particular, you might have no way of knowing whether it changes continuously (like the temperature) or whether it’s subject to sudden changes (like the number of terrorist victims). You have no formula for it; you don’t even know whether there is a formula. It could be absolutely anything.

(For those who want more precision: This thing you’re observing is a real-valued function defined on the positive real numbers — which we can think of as a function of time. It can be any function whatsoever.)

Now suddenly, at some randomly chosen moment, your ability to observe comes to a halt. If that randomly chosen moment is, say, 6:23 AM, then your observations go right up to , but do not include, 6:23 AM.

Your task is to accurately guess the value of the variable at that first moment that you can’t observe. In fact, let me make your job a little harder. If your observations stop at 6:23AM, then I want you to accurately guess the value at 6:23 AM and for a little while thereafter.

Now here’s the theorem: There is a strategy that will allow you to succeed at this task with probability 100%.

If that sounds even remotely plausible to you, then you haven’t understood it.

If you could implement this strategy, you could certainly beat the stock market, unless you preferred to spend all your time winning bar bets about the paths of raindrops along windowpanes. Unfortunately, the strategy is (provably) too complex to implement on any computer. But there’s a sense in which it’s easy to describe, though not to implement. Sometime soon I’ll blog the description and a sketch of why it works. If you just can’t wait, here is the paper I learned this from.

41 Responses to “The Most Counterintuitive Theorem EVER”

It’s one of those theorems that go “If you believe in the Axiom of Choice, then such-and-such implausible thing exists, but there’s no way of finding it in practice.” In other words, it’s just one more argument to reject the Axiom of Choice. I’m still waiting for somebody to find a contradiction between the Axiom of Choice and something so intuitively true that the Axiom of Choice gets rejected for good.

Looking at it, I’d say that it suffers from the non-constructivness of AC, i.e. it’s well and good we can show it exists, but if it can’t be explicity constructed…

Sort of reminds me of B-T paradox :) – which BTW, I’d consider even slightly more counterintuitive (due to the fact that one has to have a true scenario to start with, while in practice we usually have t that is fixed and a set of possible true scenarios).
Still, a nice exercise.

The counterintuitive nature is readily explained by the fact that “probability of 100%” does not imply that the predictions are ALWAYS correct. It merely implies that there are a finite set of circumstances (or even an infinite set, as long as this set is not dense, i.e., does not include members near all possible predictions) where the prediction will fail, but that finite set is negligible when compared to the uncountable infinity of predictions that will prove true in the mean sense.

Anyone who has studied measure theory will understand this. Since the cumulative probability is an integral over the domain of all possible outcomes, one can always consider a change in the outcome at a finite number of points (each of which has measure zero) without changing the result of the integral, i.e., the probability stays at one even after these changes, even though each of these changes invalidates the prediction. If nothing important is happening at these isolated points, then the predictions are fine. If something (e.g., a black swan) is occurring at one of them, then the predictions are not fine.

So this theorem is great for well-behaved systems in the mean sense. But for an unexpected result, it breaks down. No counter-intuition is necessary!

Hmmmm! Sounds like quantum physics applied to socio-economic phenomenon. Like QP, I would expect the condition to change as a result of any action by those trying to exploit the condition. And that doesn’t even address the many orders of magnitude greater complexity of human activity than those pesky little sub-atomic particles.

@CM:
No, I don’t think you have it right. The claim is that for any function no matter how ill behaved you can predict accurately with measure 1. In the example in the paper, if you have an infinite number of men with hats who cannot see their own hat, all but a finite number can correctly predict the colour of their own hat. That is pretty counter-intuitive I think. It’s like claiming an algorithm that can correctly predict an infitite number of coin flips and be wrong only a finite number of times.

(Steve Landsberg: I’m looking forward to learning about your insights into the original topic of this post, and I don’t intend to hijack the thread with this different problem . . . many of the readers of this blog are probably familiar with this problem, but, if not, I think they’ll find it interesting. I hope you don’t mind.)

The prisoner problem at the end of the Hardin and Taylor paper reminded me of another interesting, hard, but tractable prisoner problem:

To liven up the day for his prisoners, a warden devised the following game. Each of the 100 prisoners has a unique number, and each of the numbers are written on a note card. The 100 cards are placed in random order with the numbers down on a table in a room, and the prisoners are allowed to enter the room one at a time. Each prisoner can turn over fifty cards, and, if their number is on one of the cards, they return the cards to their original positions and go to another waiting room where they cannot talk to any prisoner who has not yet entered the room. If any prisoner does not turn over their number, then the game is over for the day and they can all try again the following day. If all 100 prisoners turn over a card with their number, then the prisoners are set free.

The warden was not worried about this game because, he believed, the chance of having to release the prisoners was about one in 2^(100), which is a really, really small chance. However, one of the prisoners was really, really smart, and he devised a way to increase the chances to about one in 3. His method didn’t involve ‘cheating’ in any way. He simply devised a strategy that all of the prisoners agreed to use, and, by using this strategy, they were released within one week. What is the strategy?

“There is a strategy that will allow you to succeed at this task with probability 100%.” Is that really what is being said? I’m not a mathematician (not by a damn sight) but it seems like it’s saying you’ll be wrong for only a finite number of times. Not much help if you’re biological and only get to trade for a finite time anyway. Maybe helps if you’re a Goldman-Sachs computer and get hooked up with the singularity to trade for infinity…

PM: What it says that out of the infinitely many instants between (say) 12PM and 1PM, there are only a finite number of instants at which your strategy is capable of failing. If you are called on to make a prediction at some time between 12PM and 1PM, the probability that you’ll be called on exactly at one of those bad moments is zero.

Assume the cards are not moved once laid out.
It suffices to have the prisoners play in numerical order each day.
Call move A flip the leftmost 50 cards, move B flip the others.
day 1 prisoner 1 makes move A and if he gets a chance p2 plays B,
and prisoners alternate as they come up to play
(maximizes chances to guess right the first time).
After day 1 all prisoners who have played now know their winning move
so they go first. Then the rest ignore them and continue playing
alternative moves as above. Repeat as needed.
I have not done the the recursive sums but knowledge accumulates every day
at an accelerating rate.

Yeah – I don’t agree with the way you’re presenting this theorem at all. I haven’t verified it, and I don’t know if it’s been refereed, but let’s stipulate that it’s correct. You’re presenting measure theoretic facts as Reality when the Axiom of Choice (AC) is in play. You may as well talk about magic – measure theory with AC is about as connected with reality as Harry Potter.

Generally speaking, translating pure math to English is sorta sketchy . . . and I thought the discussion you gave of metamathematics a few posts ago suffered from that. Anytime you speak of math in normal language it’s very reductive and there’s a great deal lost in translation.

Yours,

- Descriptive Set Theorist who has successfully Piled it Higher and Deeper.

Actually, any countable set is of measure zero – which is being used as “0% probability” here. The rationals are dense and countable, and so give an example of a measure zero dense set. Remember that the

m(A1 + A2 + A3 + . . .) = m(A1) + m(A2) + m(A3) + . . .

and the measure of any singleton is 0, so any countably infinite set is the countable union of measure 0 sets, and therefore is itself measure 0.

I pretty much agree with the first comment. The Axiom of Choice (or at least the uncountable version) has so many “obviously wrong” corollaries, that it is obviously not true.

Or perhaps it is the case that the real numbers and all that goes along with them (eg continuity, completeness etc) are not representative of reality. And so the ancient Greeks were right in their pre-Pythagorean beliefs that everything in the material world can be represented by integers or their ratios.

Until I see mathematicians becoming insanely rich via these insights, I’m going to have to withhold my applause. I definitely didn’t fully make sense of the symbology, but the whole thing smells like a re-fry of Zeno’s Paradox.

A Guy makes a very good point. As i remarked in another thread there are models of ZF where all sets of real numbers are measurable. In such models AC fails and results like this do not happen. Nor does Banach Tarski. Is the real number system in such a model a better description for reality than ZFC? I don’t know anyone has an answer but it feels that way to me.

From TB’s post:
“…The 100 cards are placed in random order with the numbers down on a table in a room…”

Emphasis on the notion that ‘positions on the table’ imposes another ordering on the cards. So each card actually has two numbers in the set {1,2,…,100} associated with it: one for the position, and one for writing.

So you have two different orderings on the cards at your disposal, and one of them (position on the table) is freely available to all prisoners. Now all you need is a lot of creativity (and SPOILER ALERT maybe some group theory)

Continuous functions are a thing to be treasured. A different Taylor http://en.wikipedia.org/wiki/Taylor_series told me that if you give me the value of a function and all its derivatives even for an instant, then I can reconstruct the function for all time.

I look forward to a translation, as I did not make much sense of it with a quick glance through. There may be some loss in translation, but perhaps you can make it understandable for those of us who have come across “axiom of choice” for the first time.

TB. Interesting problem. You know 1) position of the cards is fixed for each day, 2) how many prisoners have gone before you and got it right, 3) the cards the previous prisoners turned over and 4) your own number. Is this similar to the aquiring the number of “less thans” in the “guess the secret number” problem?

Of course the rationals are dense in the reals — that’s what makes computational numerics possible (among other useful things).

But the paper’s authors specifically rule out density in their Theorem 5.1, and the set W defined there forms the counterexamples to the counter-intuitive result. Hence the stricture in my earlier comment

The numbers are just labels so WLOG, the prisoners numbers are 1-100. When prisoner i enters the room, he first looks at the card in position i. If the card reads j, he next looks at card in position j. If this card reads k he next looks at k and so on. I claim this works with probability 1 – ln 2 =~ .307.

The card layout can be modeled as a permutation on 1-100. If this permutation has a cycle of length 51 or larger, the prisoners will lose. But if all cycles have length 50 or less, then all prisoners traverse their number’s entire cycle and make it back to their own number in 50 steps or less.

What remains is find the chance of no large cycles. There are (100 choose 51) * 50! * 49! = 100! / 51 permutations on 1-100 with a cycle of length 51. There are 100! * (1/51 + 1/52 + … + 1/100) permutations on 1-100 with a large cycle. The odds of a large cycle in a random permutation are (1/51 + 1/52 + … + 1/100) =~ (integral of 1/x from 50 to 100) = ln 100 – ln 50 = ln 2. Thus, the odds of winning are 1 – ln 2.

Jeffry re Thomas Bayes- I don’t doubt the math is correct, but I do not see why it is correct. To work it through, I tried an example with 10 prisoners, each selecting 5 cards. I also assumed it does not matter which order the prisoners go in, so I started with prisoner 1 and worked through. I attempted to assign random numbers to the card just by trying a random sequence, then starting from different positions in my list to assign values to card number 1 etc. Lets say it goes like this:
Day 1,
1=3, 2=6, 3=8, 4=1, 5=7, 6=4, 7=9, 8=10, 9=2, 10=5
Prisoner 1 draws card numbers = 3, 8, 10, 5, 7, so fail on day 1.
day 2, say cards are
1=10, 2=6, 3=7, 4=1, 5=9, 6=3, 7=4, 8=8, 9=2, 10=5.
Prisoner 1′s draws = 10, 5, 9, 2, 6. so fail on day 2
Day 3,
1=6, 2=2, 3=8, 4=4, 5=10, 6=9, 7=5, 8=1, 9=7, 10=3.
Prisoner 1 draws: 6, 9, 7, 5, 10. So fail on day 3

This does not seem to be getting very far. This method of selecting does not seem to be an improvement on any random method of choosing. Have I mis-understood the solution? Are my pseudo random numbers mucking it up? Is it different for 10 compared to 100?

The figures for 10 prisoners selecting 5 cards were essentially the
same as those for the 100-prisoner game.

Though the odds of your 4-day trial having a success are around 80%,
that doesn’t mean you won’t hit an unlucky streak. My 10,000 trial
program had 2 instances where it took 19 days and 2 instances where
it took 20 days.

If you want to work it through, the easiest way is with a deck of
ten cards (ace to 10). Just shuffle and deal face up and do the
simulation that way.

I re-worked by listing numbers 1 to 10 in excel, then generating random numbers next to them and sorting on the random number to order the numbers 1- 10 randomly. They all escaped on day 1 and 3. I still don’t quite get why this happens. If we have just one person, prisoner 1, we are asking what is the chance of him turning over his number within 5 cards. He starts with the card corresponding to his number and follows the chain until either 5 cards or his number turns up. I do not see why this method gives him a better chance of turning up his card than selecting 5 cards at random. I am expecting a Eureka moment sometime, where it suddenly becomes obvious.

Eureka! If it is a closed loop, then it must contain his number because that is the one he started with. If you were to assign the prisoners number after he had turned over the cards, he would follow closed loops that do not contain his number.

What if you fix two well-orderings of T->S? The Axiom of Choice allows us to, but doesn’t force any one choice upon us, no? Couldn’t these orderings and the μ-strategy, then, lead to two completely different functions, both claiming to be right on all but at countably many places, with measure 0 and nowhere dense?

I was hoping to make up a well-ordering of my possible futures, using the “is preferable to” ordering! Such as “reading is preferable to losing my wallet”, etc. The law of math and the universe and of well-ordered sets would then guarantee me a future that is exactly my most preferred future outcome (save countably few minor points here and there).

What if you fix two well-orderings of T->S? The Axiom of Choice allows us to, but doesn’t force any one choice upon us, no? Couldn’t these orderings and the μ-strategy, then, lead to two completely different functions, both claiming to be right on all but at countably many places, with measure 0 and nowhere dense?

Absolutely. Any well-ordering will do as well as any other.

I was hoping to make up a well-ordering of my possible futures, using the “is preferable to” ordering!

Is your “is preferable to” ordering really a well-ordering? If so, you are a very unusual person indeed.

I don’t know if this point has been made elsewhere (I couldn’t locate it) but here’s a fact related to the above that is both obvious and counterintuitive:

“Given two functions that agree almost everywhere (or even agree on an unspecified cofinite subset), the probability that they agree at a particular point is still zero.”

For instance, think of a real-valued function that is equal to the zero function everywhere on the closed interval [0,1] except at finitely many points. What is the probability that the function equals zero at the point 1/2? You might be tempted to say that this probability is nearly 1, but you’d be wrong, because the “almost everywhere” class of a function is independent of its value at any point. In fact, the probability is zero.

This isn’t very precise, but can be made so. The point is that just because you are able to obtain a function that agrees with the correct function “almost everywhere” does not mean that you get a positive probability, let alone a probability of one, of agreeing with the correct function at any specific point.

For instance, think of a real-valued function that is equal to the zero function everywhere on the closed interval [0,1] except at finitely many points. What is the probability that the function equals zero at the point 1/2? You might be tempted to say that this probability is nearly 1, but you’d be wrong, because the “almost everywhere” class of a function is independent of its value at any point. In fact, the probability is zero.

This must depend on exactly what experiment you have in mind, no?

If I take the almost-everywhere-zero function f as given, and then randomly choose a point t, then the value of f(t) is zero with probability one.

If I take the point t as given and then randomly choose an almost-everywhere-zero function f, then, depending on how you interpret “randomly choosing”, the value of f(t) can be zero with probability zero — or not.

You’re right, it depends on the experiment. I was thinking of the latter: you fix a particular point, and “randomly” pick an almost-everywhere-zero function.

My broader point (sorry about the double use of the word) is that the “almost everywhere class” of a function (whether you use it in the Lebesgue measure sense or the cofinite sense) says absolutely *nothing* about the value at a *specific* (previously chosen) point, even though it says a lot about the value at a *randomly* chosen point.

Intuitively, we may think that the “right” value of a function that’s zero a.e. should be zero at a particular point, but this is a meaningless statement when we are trying to look at the space of all functions. There is no right value. The notion of a right value does become meaningful in other, more restricted, function contexts — for instance for piecewise continuous functions, or for measurable functions if we use the Lebesgue differentiation theorem. But it doesn’t translate to a meaningful statement about probability (not that I can think of), unless the conditions are so restrictive that the value is forced to be the “right” value. (Kind of like a zero-one law).