New Dilemmas for the Prisoner

Short-Term Memory

Last year’s big surprise in Prisoner’s Dilemma research came from two distinguished polymaths. William H. Press of the University of Texas at Austin began his career as an astrophysicist, became expert in numerical methods of computation, and in recent years has turned his attention to problems in biology. Freeman J. Dyson of the Institute for Advanced Study in Princeton is renowned as a physicist, mathematician, author, visionary, and all-around deep thinker. The story of how Press and Dyson came to make their discovery is almost as interesting as the result itself, but I have room here only for the latter.

Press and Dyson studied IPD strategies in which the next move depends only on the immediately preceding round of play, ignoring all earlier history. It might seem that players with a longer memory would have an advantage against such forgetful opponents, but it turns out the “shortest-memory player sets the rules of the game.” Suppose your opponent has a “memory-one” strategy, but you base your moves on a deeper history—perhaps the past 10 or 100 plays. Press and Dyson show that there is a memory-one strategy that would have exactly the same effect on your opponent.

Every round of play in Prisoner’s Dilemma has four possible outcomes: cc, cd, dc, and dd. A memory-one strategy is a list of four numbers that specify a player’s probability of cooperating following each of these events. For example, the tit-for-tat strategy is encoded in the sequence of probabilities (1, 0, 1, 0): always cooperate after cc or dc, always defect after cd or dd. Another interesting strategy, nicknamed Pavlov, has the signature (1, 0, 0, 1): cooperate when you and your opponent have made the same choice (cc or dd), defect when you differed on the previous move (cd or dc). Both of these strategies are deterministic, with all probabilities either 0 or 1, but that’s not necessary. The strategy (½, ½, ½, ½) describes a player who chooses moves completely at random.

Given any two strategies, a computer program can simulate an IPD match between the players, keeping track of how often each outcome (cc, cd, etc.) appears in the results. With a long enough run, these frequencies converge to a “stationary state,” which represents the expected outcome of the match. A handy property of memory-one strategies is that it’s not actually necessary to run such a program; the stationary state can be calculated directly, without tracing through the game move by move. This shortcut makes it easy to explore a wide spectrum of game strategies.

Press and Dyson were doing just that when they noticed something odd: There are memory-one IPD strategies that allow you to seize complete control over your opponent’s score. Superficially, such a strategy looks like any other—it is encoded in a fixed set of four probabilities—but the strategy makes for a strangely one-sided game, where a player’s actions have no effect on his or her own score.

The one-sided rules are members of a newly recognized class of strategies that Press and Dyson call zero-determinant strategies. The name reflects the line of reasoning that led to the discovery—the strategies appear when the determinant of a certain matrix of probabilities is set to zero—but this fact is not especially helpful in understanding how the strategies work. What sets them apart from other memory-one strategies is the presence of certain algebraic relations between the four probabilities. For example, in one subset of zero-determinant strategies only the probabilities for cc and dd events are freely chosen; they then determine the cd and dc probabilities. As a result of such dependencies the stationary state of the game is controlled entirely by one player’s strategy, without any input from the opponent. This is a situation that “allows much mischief,” Press and Dyson write.