Tuesday, July 29, 2014

GTO Poker and Multiple Equilibria Part 2

Today I'm going to continue examining GTO play in games with multiple equilibria. I'm going to focus this post entirely on zero sum two player games and we'll take a look at various GTO strategies and how they perform against different types of fish.

If you haven't already, be sure to check out part 1 of this post where we observed that in two player zero sum games when both players play GTO strategies their EVs are the same no matter which equilibrium strategy they play but that against fish some GTO strategies performed better than others.

In the previous post we looked at a river situation with multiple GTO strategies one of which performed as well or better against all types of suboptimal play than all other GTO strategies. Today I'm going to start by showing a very simple example of a game with multiple GTO strategies where every strategy performs better against one type suboptiaml player and worse against another type, such that there is no "best" GTO strategy.

The key take away that I want to convey is that contrary to popular belief, it is entirely possible to alter your strategy in an exploitative fashion to take advantage of your opponent while continuing to play GTO in all sorts of zero sum games including poker.

Contrived Example: AB Game

I'm going to start by introducing a totally made up game that has no bearing in reality. As an actual game, it is quite boring, but it actually does a very good job of illustrating the general concept of how a game might have multiple GTO strategies each of which is better against a different type of fish.

The game works as follows. Each player privately chooses one of four options, a, A, b, or B. They then simultaneously reveal their choice and a winner is determined. The loser must pay the winner a dollar, if there is a tie no money is exchanged The options are ranked as follows: A and a both tie b and B, however A beats a and B beats b.

From examining this game it should be quite clear what the set of GTO strategies is. In all cases, playing A is as good or better than playing a because if your opponent plays b or B you tie either way, but if he plays A you lose when you play a instead of tying. Because the exact some logic holds for B, it cannot be GTO to ever play either a or b if a GTO player would ever play A or B which he would.

It turns out that any strategy that always plays either A or B with some frequency is GTO in this game, so always playing A is GTO, always playing B is GTO and in general playing A x% and B (1-x)% is GTO for all x in [0,1]. This is easy to verify so I won't check it here. When two GTO players play each other they always tie so their EVs are 0.

Now lets think about performance against suboptimal play. Suppose there are 2 fish, Fish a and Fish b. Fish a always plays a and Fish b always plays b. The GTO strategy that always plays A, wins every time against Fish a and breaks even against Fish b. The GTO strategy that always plays B wins every time against Fish b and breaks even against Fish a. Furthermore, these GTO strategies are actually maximally exploitative against the fish they exploit.

The result of this is that there is room to be exploitative and adapt your strategy to your opponent while still remaining GTO. Imagine you are actually playing 100 rounds of this game against a random opponent from a pool of fishy opponents, some of whom play various GTO strategies, some of whom play a more often than they play b and others who do the opposite.

It would be 100% reasonable (and far more profitable than picking a specific GTO strategy and sticking to it) to have a default GTO strategy that you usually play, say always playing A, and deviating to an alternate GTO strategy as soon as you saw that your opponent played b more than me played a.

Note that in this case the behavior that we are adapting exploitatively is on the equilibrium path and when we play against a GTO opponent our strategy change will be observable to them. We are actually switching which equilibrium strategy we are playing. As we'll see below, there is another way to exploit opponent tendencies while remaining GTO that involves only altering our play when our opponent takes actions that are off the equilibrium path.

A Poker Example -- GTO Wiggle Room

I'm going to try and look at a somewhat real world scenario that might emerge in a HU game in a 3-bet pot on a draw heavy board. I kept the ranges unrealistically small for simplicity and wasn't careful to precisely model accurate stack sizes because this example is designed to be illustrative of a broadly applicable concept, not to accurately address a specific situation.

Imagine we're on the river, on a draw heavy board where the river card completed the flush. The OOP players range consists of strong over pairs, while the IP players range consists of busted sraight draws, made flush draws, and a hand full of sets and two pairs.

Specifically:

The board is: 2sTs9c5h3s

The IP range is: 22, 87, T9, QJ, Ks8s+, As2s-AsJs, 7s6s, 9s8s, Qs9s

The OOP range is: QQ+

There are 100 chips in the pot and 150 left to bet

You can view GTO play for this scenario below:

One of the first things we can note about GTO play is that the OOP player should never bet half pot when his opponent is playing GTO. What this means is that if we run into a sub-optimal player who does bet half pot at us we have some "wiggle room" where we can adjust our strategy a bit to exploit the type of range that we think he is betting half pot with. We still remaining GTO after the exploitative adjustment so long as we don't adjust our strategy so much that a GTO player would be able to increase his profit by exploiting our adjustment by betting half pot at us with some range.

Because in this type of situation, betting from OOP is just a fundamentally weak play (similar to playing little a or litle b in the game above), there are generally going to be many different reactions to such a bet that will still be low enough EV for a best responding opponent that from his perspective, checking is still more profitable for them than betting, even if our response to the bet is unbalanced.

Specifically lets consider two potential fish types, both of whom are going to randomly lead for half pot with their entire range 10% of the time. Fish 1 is thinking that "when he shoves here he's never bluffing" and is feeling you out with his bet and plans to fold to a shove 100%. Fish 2 is thinking "OMG I haz overpair" and is planning to call a shove 100%.

The question I'm going to look at in part 3 is this: is there enough wiggle room that we can have two significantly different and more profitable strategies that exploit Fish 1 and Fish 2 respectively, but that are still both unexploitable enough when our opponent leads out for 1/2 pot to be GTO?

Stay tuned... Hopefully part 3 of this post will be out in the next week or two :)