January 19, 2015

Poker's Deus Ex Machina (Part I)—How a Computer Proved Poker Is a Game of Skill

One of my favorite TV shows is the CBS drama, Person of Interest. The show's plot is driven by the concept that a fully functional artificial intelligence (AI) computer program has actually been created. The AI system—known as "The Machine"—was originally created as an omnipresent surveillance tool to detect terrorist plots for the government. The Machine's creator feared those in power would abuse its abilities and went underground, programming The Machine with an ethical code and using it to predict and prevent criminal acts.

In last week's episode—the aptly titled "If-Then-Else"—the show's protagonists were caught in a conundrum, needing both to hack into a computer system to prevent a stock market crash while also escaping a trap meant to capture or kill them. The show flashed back to when The Machine was first created and its creator was teaching it to play chess. The Machine would calculate thousands of possible moves ahead, and when a chosen line of attack ultimately failed, would alter its strategy for future games, thereby learning how to play the game better. The rest of the show involved watching The Machine run through multiple alternative game plans for the team of heroes, with many of them ending in the team's demise. The Machine ultimately found a solution which gave the team a slim but real chance of survival. Of course, the show ended in a shocking cliffhanger—the apparent death of one of my favorite characters—which initially angered me, but ... well, as they say, "Spoiler Alert".

Just a few days after watching that Person of Interest episode, news broke via Science magazine that a team of scientists in Canada had "essentially weakly solved" heads-up limit hold 'em poker (HULHE). The methodology used to develop the Cepheus poker program is uncannily similar to that used with Person of Interest's fictitious Machine. Essentially, Cepheus played billions of billions of hands of poker against an identical program, beginning with random trial and error as to the proper strategy—bet, raise, call, or fold—at each decision point. Once the hand concluded, Cepheus would assign a "regret factor" to each decision based on the hindsight knowledge of the actual hand results. As tens of thousands of similar hands and situations accumulated, Cepheus would adjust its strategy to lessen or avoid decisions with higher regret factors, instead pursuing the balance of decisions which, overall, caused the least regret. For example, Cepheus might initially only check or call on most flops with a pocket pair higher than any card on the board, but would over time learn that betting or raising a high percentage of the time is a better (i.e., more profitable) strategy. Eventually, the program reached a point where further adjustments created more regret, meaning that the program had developed a non-exploitable, Game Theory Optimal (GTO) strategy.

"At first, I was roundly stuffed by the computer’s non-stop aggression. Any bluffs I made failed miserably. To counteract this, I became more aggressive preflop and stopped bluffing almost entirely. Cepheus’s game did not adapt to my play and it made what I would consider several questionable plays. The program was reluctant to ever give up any sort of hand in a large pot making it easier to get lots of value from moderately weak hands."

Hall's claim is utterly at odds with the claim that Cepheus has solved HULHE, and reflects a lack of understanding of what GTO strategy means in game theory. If Cepheus in fact is playing a GTO strategy, then by definition Hall cannot play a style which attacks a flaw in Cepheus' strategy because GTO strategy has no flaws to exploit. As Cepheus' creators explain (emphasis added):

"Since poker is a symmetrical game, the end strategy which Cepheus plays is an unbeatable one. While chips can, of course, be won from Cepheus in the short term, there is no decision which can be made against it which will be a winner in the long term. If a perfect opponent, either human or computerized, were to play a semi-infinite number of hands against Cepheus the best possible result would be for them to break even. Any imperfect opponent, which unfortunately includes all human players, would make mistakes along the way and lose."

Rather than exposing a supposed flaw in Cepheus, Hall's short-term positive results were purely a matter of short-term variance—that is, Hall got lucky. Now, this is not to say that Hall's change in tactics had no effect; either:

Hall originally was playing sub-optimal poker and correctly adjusted toward GTO strategy, improving his results (which were augmented by short-term variance); or,

Hall incorrectly adjusted away from GTO strategy to exploit a perceived (but illusory) flaw, won over the short-term because of variance, but would in fact lose over the long-term utilizing that strategy.

To be fair, Hall acknowledged that the 400 hands he played—essentially one or two decent cash game sessions—were an insufficient sample size to evaluate Cepheus. Still, Hall doubled-down on his irrational doubting of Cepheus:

Perhaps the best way to show off Cepheus would be to issue a challenge over a fixed amount of hands to a world-class professional player like Daniel Negreanu or Phil Ivey. This could create poker’s own version of Deep Blue v Garry Kasparov and would certainly be interesting for poker junkies like myself. I’d probably still take man over machine, though.

Assuming a statistically significant number of hands were played, Hall picking a human to defeat a computer playing GTO strategy? GTFO!

Hall's article did indirectly point out one crucial distinction between the Cepheus GTO strategy and the strategies employed by skilled human poker players: Human players will often deviate from GTO strategy in specific situations against specific opponents in order to maximize their profits through exploitation of weak players' worst errors, even though that particular non-GTO strategy would lose money over the long run against most opponents. As Cepheus' creators readily admit (link added):

"[W]hat Cepheus cannot do is maximize its winnings against weak opponents, a skill at which humans excel. Cepheus is simply an invincible, immovable bunker, a Maginot Line that actually works."

Although Cepheus is an impressive achievement in its own right, my thoughts immediately turned toward how Cepheus would play in the legal world. Does the development of a computer program capable of GTO poker play decisively prove that poker is a game of skill rather than a game of chance?

The skill game argument reached its apex in the DiCristina litigation, where a federal district court judge found poker to be a game of skill for purposes of the federal Illegal Gambling Business Act. As discussed in my analysis of the DiCristinadistrict court decision, the court's analysis of the skill game issue was driven in large part by Dr. Randall Heeb's sophisticated statistical analysis of millions of online poker hand histories. Dr. Heeb was able to demonstrate that winning players displayed a skill edge greater than expected variance within a few thousand hands of play, and also that winning players won more money than losing players even when playing the same starting hands.

Cepheus advances the skill game argument by demonstrating that there is a theoretical strategy for playing HULHE which is optimal, in the sense of being non-exploitable over a sufficiently large number of hands. In fact, Cepheus' creators note that there may be multiple GTO strategies for HULHE: "different Nash equilibria may play differently". (p. 9).

Cepheus makes two significant contributions to the skill game argument. First, Cepheus demonstrates that individual poker decisions must be evaluated in the aggregate, over time. To this point, many of the examples of poker skill used to support the skill game argument have focused on individual or tactical poker plays—for example, how pot odds, stack size, or starting hand strength can be used to determine correct game decisions. Cepheus shows that the game is substantially more complex than any one play or hand, and that a strategic approach to game decisions is both necessary and possible. Although successful poker players recognize the importance of long-term strategic game theory concepts such as range-balancing, Cepheus is a rigorous mathematical and logical proof of the importance of poker players thinking beyond the immediate play or hand. In other words, Cepheus is a refutation of the superficial legal argument that poker is a game of chance because, regardless of skill, players are still "subject to defeat at the turn of a card" in a particular hand.

Cepheus also makes another, more significant contribution to the skill game argument. Some of the legal arguments made in favor of poker as a skill game overreach, trying to establish that nearly nothing in the game is beyond the control of the player; these arguments fall flat against the easily observable elements of chance in play. Cepheus, however, takes the element of chance head on and renders it irrelevant. Cepheus does not deny the presence of a significant element of chance in the game. Cepheus simply is indifferent—impervious even—to the effects of chance over the long-term. Regardless of what cards may fall by chance, Cepheus will over the long-term win against opponents playing a non-GTO strategy. For purposes of the legal skill game argument, Cepheus serves as the embodiment of the triumph of skill over chance.

So, does Cepheus mean that the legal skill game argument is over? Hardly. Cepheus is a proof of HULHE only. One of the Cepheus researchers, Neil Burch, is participating in a couple of threads in the Two Plus Two poker forums where he describes some of the limits of the Cepheus results. First, Burch does not think heads up no-limit poker is solvable using current methods and technology because of the exponentially greater number of decision points in play. Second, Burch points out that moving from a heads up to even a three-person game adds significant layers of complexity to the analysis, including the interesting possibility that two players could collude to exploit a third player, even if that third player was playing an equilibrium strategy. Consequently, Cepheus is better viewed as a "proof of concept" of the degree of skill involved in poker, not a proof that every version or permutation of poker has a GTO strategy impervious to the effects of chance.

Unfortunately, what Cepheus giveth, Cepheus also taketh away. Like a demon from a bad horror flick, Chance does not want to stay dead and buried. Stay tuned, true believers, for our next episode when Cepheus resurrects Chance to haunt the skill game argument once again.

* * * * *

AUTHOR'S NOTE: This post is the first of two related posts. Part II is HERE.