"A good stock of examples, as large as possible, is indispensable for a thorough understanding of any concept, and when I want to learn something new, I make it my first job to build one." – Paul Halmos

Cantor’s theorem, the prisoner’s dilemma, and the halting problem

Cantor’s theorem is somewhat infamous as a mathematical result that many non-mathematicians have a hard time believing. Trying to disprove Cantor’s theorem is a popular hobby among students and cranks; even Eliezer Yudkowsky1993 fell into this trap once. I think part of the reason is that the standard proof is not very transparent, and consequently is hard to absorb on a gut level.

The goal of this post is to present a rephrasing of the statement and proof of Cantor’s theorem so that it is no longer about sets, but about a particular kind of game related to the prisoner’s dilemma. Rather than showing that there are no surjections , we will show that a particular kind of player in this game can’t exist. This rephrasing may make the proof more transparent and easier to absorb, although it will take some background material about the prisoner’s dilemma to motivate. As a bonus, we will almost by accident run into a proof of the undecidability of the halting problem.

Cantor’s theorem

If are two sets, let denote the set of functions . In particular, we will write the power set of as , where , since we can identify a subset of with its indicator function

.

Theorem: Let be a set. Then there does not exist a surjection .

Proof. Suppose otherwise, and let be a surjection. Consider the subset

.

By hypothesis, there exists such that . But now if , then , so , but if , then , so ; contradiction. Hence does not exist.

Note that the proof goes through perfectly fine for a finite set, where it tells us that for all non-negative integers . This is one hint that Cantor’s theorem is not really about infinite sets, even though its main application is proving that infinite sets come in different sizes.

The prisoner’s dilemma

A prisoner’s dilemma is any two-player game in which the players move simultaneously; each player has two possible moves, cooperate or defect; and each player prefers outcomes in the following order:

(I defect, you cooperate), often called exploiting the other player, followed by

(I cooperate, you cooperate), then

(I defect, you defect), and then

(I cooperate, you defect), often called being exploited.

A common way of analyzing games like the prisoner’s dilemma is to determine their Nash equilibria. To define these, we first need the notion of a mixed strategy, which is a probability distribution over possible moves. A Nash equilibria in a game is then a choice of mixed strategy for each player such that each player, upon learning the mixed strategy of the other player, does not wish to change their strategy. Nash famously proved that in a game where each player has finitely many possible moves, a Nash equilibria always exists. We need to consider mixed rather than pure strategies because Nash equilibria are mixed in general; for example, the unique Nash equilibrium in rock-paper-scissors is that each player chooses rock, paper, or scissors with probability each.

The prisoner’s dilemma roughly models many real-world situations, and is of interest because the only Nash equilibrium in the game is that both players defect (since both players, upon learning what move their opponent is making, would strictly prefer to change their move to defect), but this situation is worse for both players than if both players had cooperated. This raises some interesting questions about how mutual cooperation can occur among rational agents of the kind considered in game theory, and possibly also some questions about how mutual cooperation can evolve in nature.

One idea is to consider the iterated prisoner’s dilemma, where two players repeatedly engage in a prisoner’s dilemma for many rounds. Here it is important to assign point values to the various outcomes and specify that both players are trying to maximize the number of points they have, and then it could be a sensible strategy to cooperate rather than defect in the hopes of promoting cooperation on future rounds. Unlike a one-shot prisoner’s dilemma (one that lasts one round) where it is assumed that you don’t know anything about your opponent, in an iterated prisoner’s dilemma you and your opponent learn about each other over time, so if you each learn that the other is cooperative then mutual cooperation might occur.

Axelrod famously ran an iterated prisoner’s dilemma tournament played by programs implementing various strategies, and found empirically that greedy strategies (that try to exploit opponents) generally did worse than altruistic strategies (that offered cooperation even with the possibility of being exploited). The most successful deterministic such strategy was tit for tat, which is also very simple to describe: cooperate on the first round, then repeat whatever your opponent did on the previous round. Tit for tat achieves mutual cooperation against other players that also cooperate but immediately attempts to punish defection with defection of its own. The iterated prisoner’s dilemma is a plausible rough model of social interaction, and tit for tat (or its more altruistic cousin, generous tit for tat, which occasionally forgives defection on the part of its opponent), is a plausible rough model of how cooperation among social animals like humans works.

But what about trying to achieve mutual cooperation on the one-shot prisoner’s dilemma? Unlike the iterated prisoner’s dilemma, it seems that you have no chance to learn anything about your opponent. Here a standard argument makes an additional assumption: namely, that you assume your opponent reasons the same way you do and that you both have common knowledge of this. (It is also typical to further assume that the payoff matrix is symmetric.) In this case, it seems plausible that if you cooperate, then so will your opponent, and similarly for defection. In other words, exploitation is asymmetric and therefore seems to be ruled out, and in that case you should cooperate expecting your opponent to have come to the same conclusion. In the philosophy literature this is known as the symmetry argument for cooperation; Hofstadter calls this superrationality.

However, this argument is philosophically contentious. It almost seems to be claiming that choosing to cooperate would somehow cause your opponent to cooperate, while choosing to defect would cause your opponent to defect, which seems implausible. The contention here is related to issues surrounding Newcomb’s problem and decision theory which we’ll avoid for the time being. Can we say something less contentious?

The prisoner’s dilemma with visible source code

One reason the iterated prisoner’s dilemma is different from the one-shot prisoner’s dilemma is that playing multiple rounds allows you to learn about your opponent. Consider the following alternate setup for learning about your opponent while only playing one round: two players, rather than play against each other directly, write programs to play a one-shot prisoner’s dilemma for them. Moreover, the programs are fed as input each other’s source code. Call this the prisoner’s dilemma with visible source code. For a brief introduction to the literature on this game, which gives rise to the notion of program equilibrium, see for example this blog post and the resources therein. I was introduced to it at a workshop run by the Machine Intelligence Research Institute.

We’ll call a program that plays this game a bot. The two simplest bots are CooperateBot and DefectBot, which always cooperate and always defect respectively. CooperateBot is exploitable (it gets exploited by at least one other bot), and DefectBot can never achieve mutual cooperation. Can we do better? For starters, does there exist an unexploitable bot that is not DefectBot (equivalently, that achieves mutual cooperation against at least one other bot)?

As it turns out, there is. CliqueBot is the bot that checks its opponent’s source code to see if it’s identical to its own source code (which it has access to via quining; see also Kleene’s recursion theorem). If so, it cooperates; otherwise, it defects. CliqueBot defects against every bot except CliqueBot and achieves mutual cooperation with CliqueBot, so it satisfies our requirements.

Another interesting example is FairBot (although the version I’m about to define is not the version defined in the blog post I linked to), which executes its opponent’s source code when fed its own source code as input and returns the result as its move. FairBot can be seen as trying to instantiate the symmetry argument for cooperation. However, it ends up running forever when it plays against itself, so it doesn’t in fact achieve mutual cooperation with itself. This can be fixed by modifying FairBot so that it searches for proofs in, say, Peano arithmetic of up to some fixed length that its opponent cooperates and cooperates if it finds such a proof. Somewhat surprisingly, a variant of Löb’s theorem can be used to show that this version of FairBot cooperates with itself (see the blog post previously linked to above for details).

The total prisoner’s dilemma with visible source code

To connect back to Cantor’s theorem, we will impose an additional requirement but relax a previous requirement. Namely, we will impose the additional requirement that bots are required to halt: in other words, they can’t run forever but must eventually output either cooperate or defect. However, we will relax the requirement that bots must be programs: in other words, we will allow them to potentially implement arbitrary functions. More precisely, define a bot environment to be a set of bots together with a function

describing, for each pair of bots, whether cooperates or defects against opponent . We place no other constraints on the set or the function . For example, any set of programs in the prisoner’s dilemma with visible source code which are guaranteed to halt against each other defines a bot environment.

Say that a bot in a bot environment is a CantorBot if it has the property that ; in other words, it cooperates against precisely those bots which do not cooperate with themselves.

No-CantorBot Theorem: No bot environment contains a CantorBot.

Proof. CantorBot’s move against itself is not well-defined, since its definition requires that .

Cantor’s theorem follows as a corollary. To see this we need to curry the function to obtain a function

.

By mapping , we can identify with the power set of , and then we can think of as the function which describes for each bot the set of all bots it cooperates with. If is a surjection, then there must exist bots which exhibit all possible cooperation behaviors against other bots. The above proof shows that this can’t be the case by showing that there does not exist such that

since such an is precisely a CantorBot. (Strictly speaking, the claim that in any bot environment, not all cooperation behaviors are exhibited is the correct restatement of Cantor’s theorem in this language.)

The reasoning above is essentially the same as in the Barber paradox, and hence in Russell’s paradox, but with “ cooperates with ” in place of “ shaves ” and “ contains ” respectively. Note that in the Barber paradox and in Russell’s paradox there are extra and unnecessary assumptions placed on the “shaves” and “contains” relations; in the No-CantorBot theorem, “ cooperates with ” is an arbitrary relation on a set .

The halting problem

One interesting observation about CantorBot is that it exists in the usual prisoner’s dilemma with visible source code: a bot receiving its opponent’s source code can attempt to compute what would do against itself and do the opposite of that. CantorBot will, of course, run forever against many opponents, including itself, which is why the above proof doesn’t apply to this case. In other words, functions described by a program are not really functions but partial functions: they may run forever and not produce an output.

We can think of a partial function as an ordinary function , where is the output that corresponds to running forever. And it’s only a very small jump from here to a proof of the halting theorem. We just think of the set of programs as a bot environment, equipped with the function

given by if halts given as input and otherwise. When we run this bot environment through the proof of the No-CantorBot theorem, we get the following. Suppose the halting problem is decidable by a Turing machine. Then we can program a CantorBot which, when fed the source code of another bot , halts if runs forever given itself as input and runs forever if halts given itself as input. This CantorBot can neither run forever nor halt given itself as input, and therefore doesn’t exist; contradiction.

Note that Cantor’s theorem, when applied to this situation, only claims that there exists a pattern of halting or running forever when fed various programs as input that isn’t exhibited by an actual program, which implies in particular that there exists an undecidable problem about Turing machines. But the No-CantorBot theorem actually exhibits such a problem, namely the halting problem.