"It was a watershed event, but it doesn't have to do with computers
becoming intelligent," said Douglas Hofstadter, a professor of
computer science at Indiana University and author of several books
about human intelligence, including Gödel, Escher, Bach, which won a
Pulitzer Prize in 1980, with its witty argument about the connecting
threads of intellect in various fields of expression. "They're just
overtaking humans in certain intellectual activities that we thought
required intelligence. My God, I used to think chess required thought.
Now, I realize it doesn't. It doesn't mean Kasparov isn't a deep
thinker, just that you can bypass deep thinking in playing chess, the
way you can fly without flapping your wings." — Bruce Weber, "Mean
Chess-Playing Computer Tears at Meaning of Thought," The New York
Times, Feb 19, 1996

Chess, tic-tac-toe, connect-four, checkers, go, etc. etc. are
zero-sum games, meaning in the end, the gains and losses among the
players add up to zero. In the examples listed, one player wins,
another loses, so the sum of "utility" for each player at the end is
zero.

Poker is another zero-sum game, although there is one winner and many
losers. However, the amount the winner wins is equal to the sum of the
amounts the losers lose.

Study of zero-sum games began the study of game theory, which is a
mathematical subject that covers any situation involving several
agents attempting to make decisions that maximize their own utility.

We will not be studying game theory, as a whole, but we will learn how
to "play" zero-sum games.

Expansion of the search problem

The components of a search problem included the following:

Initial state

some description of the agent's starting situation

Possible actions

the set of actions (such as chess moves)
available to the agent, also called "applicable" actions; the
possible actions depend on the state

Transition model

some way of figuring out what an action does;
in other words, a resultOf(state, action) function which
returns a state; the transition model defines a state space,
which takes the form of a directed graph (vertices are states,
edges are actions)

Note that we have left out "Goal criteria" and "Path cost." In order
to support adversarial search, we add the following:

Players

a list of players (we'll look at 2-player games) and a
switch_player(player) function that switches to the
player following player

Terminal tests

a function or functions that test for final states
(winning/losing/tied states)

Utility function

a function utility(state, player) that returns
a real number (or integer) representing the value to player of
state; presumably, winning states have the highest utility,
losing states the lowest utility, tied states have zero utility,
and states that are not terminal have a utility calculated by
looking at the utilities of following states

I move, you move, etc.

The simplest zero-sum games are those that have two players who take
turns. Chess, tic-tac-toe, etc. fit this definition. Suppose a human
is playing a computer. The computer needs to figure out which move is
best (the computer wants to maximize the utility of its move). We
learned how to search (BFS, DFS, A*, and so on), but even though the
computer will be engaging in a search, those search algorithms will
not help us here.

BFS, DFS, A* and so on attempt to maximize (or minimize) every choice
equally (for example, we want to find the shortest route). But in a
game search, instead we want to maximize the payoff of our choices but
minimize the payoff of the other player's choices.

The way to do this is to treat the opponent's positive utility as our
negative utility. So if the opponent makes a winning move, that move
is worth 1.0 to the opponent but worth -1.0 to us (a tie game has
utility 0.0 for both players). We want to find the move that maximizes
our utility (best case, 1.0) and minimizes the opponent's utility
(best case, -1.0).

Minimax algorithm

The minimax algorithm does exactly that: it finds the move that
minimizes the maximum utility the opponent can obtain. You can call it
"maximin" if you'd like: maximize the minimum utility we (not the
opponent) can obtain. They are the same algorithm, just the order of
the steps (maximize or minimize) is different.

First, we consider all of our possible moves. If a move is a final
move (winning/losing/tie), then we calculate its utility directly (1.0
for a win, -1.0 for a loss, 0.0 for a tie). If it's not a final move,
we have to look further to calculate its utility. We choose one move
from which to "look further," and consider all the possible moves the
opponent might make. If the opponent wins, the utility of the
opponent's move is -1.0. Of all the possible moves the opponent might
make, we find the worst (minimum) utility. That utility is the value
of our move.

Of all our possible moves, we choose the one that has the maximum
utility; it is the maximum of the minimum utilities generated by the
opponent's moves. We are assuming the opponent is also choosing the
best move, so when we choose our move, we want to find the move that
performs best (max) assuming the opponent also performs best (min;
i.e., max for the opponent, min for us).

Here is an example search tree for tic-tac-toe. We basically have a
depth-first search but also propogate information about utilities (max
or min) upwards.

Alpha-beta pruning

The minimax search is very costly. Far too many states are
searched. We can use a very simple, and somewhat obvious, optimization
to search only a fraction as many states while coming up with the
exact same answer as minimax.

The key insight is that when we are performing the "max" step (or
equivalently, the "min" step), if we find that a certain move gives a
maximum utility X (by searching its entire subtree), we can avoid any
searches of subtrees whose minimum value is less than X.

Look here, from min_utility (similar code is found in max_utility
and minimax functions as well):

The variable min_u here will only get smaller. Recall that the
utility that min_utility returns will be used by the max_utility
function. So different calls to min_utility will be compared to pick
out the maximum min_utility value. Suppose the min_utilty function
was told (via a parameter) that some other call to min_utility
produced a minimum utility of α, and that so far, α is the
greatest minimum utility found so far (the maximum of the
minimums).

Then if min_u ever goes smaller than α, there is no reason to
continue searching the other transitions (game moves), because min_u
can only go smaller (being a minimum), and any value smaller than
α (the previously-found maximum min_u produced by earlier
function calls) will simply be ignored when maximizing.

The same can be said about max_u in the max_utility function; we
use the β parameter to keep track of the smallest max_u so far
seen.

When maximizing (in max_utility), we will only change α because
we want to know the maximum of the minimums (α is the greatest
lower bound); when minimizing (in min_utility), we will only change
β.

α

the utility of the best (highest-value) choice we have
found so far at any choice point along the path in the
"max" mode of minimax

β

the utility of the best (lowest-value) choice for the
"min" mode of minimax

As we search down the tree, we provide the recursive search procedure
our currently-known best α and β. If the recursive search
sees that some subtree cannot give any utility better than α (or
worse than β) then we don't bother searching further from there.

It's very important to realize that the alpha-beta pruning procedure
does not produce different answers than minimax. Alpha-beta pruning
simply does not bother checking subtrees when it is certain those
subtrees will not change the answer produced by minimax.

Python code for alpha-beta pruning

The minimax code above needs few modifications to support alpha-beta
pruning:

defminimax_ab(state):
max_trans = Nonemax_u = Nonealpha = Nonebeta = Nonetransitions = possible_transitions(state, 'x')
# Find the transition (move) that provides the maximum# utility, assuming the opponent also makes a best movefor trans, nextstate in transitions.iteritems():
# after making our move, find the best move the# opponent can make (best for opponent = worse for us;# if we consider the opponent winning as negative# utility, we want to find the minimum utility move# of the opponent's possible moves)u = min_utility_ab(nextstate, 'o', alpha, beta)
if max_u isNoneor u > max_u:
max_trans = trans
max_u = u
# if the utility we just found is greater than beta, and these# utilities can only get bigger (because we're in a max# stage), then there is no reason to check furtherif beta isnotNoneand u >= beta:
return trans
# if the utility we just found (from the min stage) is greater# than alpha, we found a new greatest min; this will restrict# future searches not to look further if they are minimizing# and they find a utility less than alphaif alpha isNoneor u > alpha:
alpha = u
return max_trans
defmin_utility_ab(state, player, alpha, beta):
# if the current state is a win/loss/tie, stop searchingif is_winning(state, player) or \
is_winning(state, switch_player(player)) or \
is_tie(state):
return utility(state, 'x')
else:
transitions = possible_transitions(state, player)
min_u = Nonefor nextstate in transitions.values():
# after making a move (current player is in the# "player" variable), find the minimum next# move and return its utilityu = max_utility_ab(nextstate, switch_player(player), alpha, beta)
if min_u isNoneor u < min_u:
min_u = u
# if the utility we just found is smaller than alpha, and# these utilities can only get smaller (because we're in a# min stage), then there is no reason to check furtherif alpha isnotNoneand u <= alpha:
return u
# if the utility we just found (from the max stage) is# smaller than beta, we found a new smallest max; this# will restrict future searches not to look further if# they are maximizing and they find a utility greater than# betaif beta isNoneor u < beta:
beta = u
return min_u
defmax_utility_ab(state, player, alpha, beta):
# if the current state is a win/loss/tie, stop searchingif is_winning(state, player) or \
is_winning(state, switch_player(player)) or \
is_tie(state):
return utility(state, 'x')
else:
transitions = possible_transitions(state, player)
max_u = Nonefor nextstate in transitions.values():
# after making a move (current player is in the# "player" variable), find the maximum next# move and return its utilityu = min_utility_ab(nextstate, switch_player(player), alpha, beta)
if max_u isNoneor u > max_u:
max_u = u
# if the utility we just found is greater than beta, and# these utilities can only get bigger (because we're in a# max stage), then there is no reason to check furtherif beta isnotNoneand u >= beta:
return u
# if the utility we just found (from the min stage) is# greater than alpha, we found a new greatest min; this# will restrict future searches not to look further if# they are minimizing and they find a utility less than# alphaif alpha isNoneor u > alpha:
alpha = u
return max_u

A poor example?

Tic-tac-toe is such a simple game that minimax and alpha-beta pruning
are not really necessary. Apparently, here is optimal play (from
xkcd):