In order for this site to work properly, it uses cookies and javascript.
You can find more information here.

In order for this site to work properly, it uses cookies and javascript.
This site also tracks visits anonymously using cookies. Click 'agree' to confirm you are happy with that.
You can find more information in this site's policy.

Using Reinforcement Learning To Learn To Play Tic-Tac-ToeAbout a year ago I set myself the goal of writing an algorithm that could learn to play tic-tac-toe. I didn't want to tell the algorithm what the rules of the game are, nor did I want it to try and use some kind of calculation to look ahead at possible ...

The best opening move in a game of tic-tac-toeAs part of a machine learning project, I had to understand tic-tac-toe better, and so I have written an algorithm which a) finds all the possible unique games and b) gathers statistical information about those games. Based on Wikipedia's tic-tac-toe ...

As part of a machine learning project, I had to understand tic-tac-toe better, and so I have written an algorithm
which a) finds all the possible unique games and b) gathers statistical information about those games.

Assume X always starts.
As an example, take the game where X moves top left,
followed by O moving top right, then X going middle left followed by O going
top middle. These first four moves can be written down as "1342". The game could continue
and once completed could be written as "134258769". It's not a perfect game because the first
player misses a few opportunities to win and in the end it's a draw.

Every possible combination of moves making up unique games of tic-tac-toe are hence
found somewhere between the numbers 123456789 and 999999999 (although probably iterating up to 987654321 suffices).
Most of the numbers are illegitimate because each cell is only allowed to be filled once, so for example the number
222222222 does not represent a valid combination.
In order to find every valid combination we simply start with that lowest number and iterate up to nine nines, attempting
to determine if each number is a valid combination and if it is, record the results of the game.

In order to determine if a combination is valid, we use the following Javascript snippet, which checks that the nine digit number
does not contains zeros and contains all of the digits 1 to 9:

If a number does represent a valid combination, the next step is to iterate through each digit starting at the left, to work out when
the game is finished (it can be won after just 5 moves, or it can require all 9 moves, ending in a win or a draw).
For example numbers starting with '12437' represent combinations completed after just 5 moves.
Although there are 9999 such valid combinations, the game is only recorded once and all other combinations of numbers
which start with 12437 are discarded.

A game is analysed using the following algorithm.
The empty board is built by creating a two dimensional array of objects using indexes i and j as shown above.
An element in an array of cells represents a cell on the board and has attributes 'i', 'j' and 'v' which records
the contents of the cell, either null, 'X' or 'O', depending on whether a player has moved in that cell.

The function checkFinished(board)
simply checks whether or not the game is over (a draw, or a player has three in a row),
and returns 'DRAW', 'X' or 'O' depending upon the outcome of the game.
'uniqueGames' is an object used to track whether or not the current game has already been seen. Using the example above
the combination '12437' will be seen 9999 times, but to all intents and purposes it is just one game of tic-tac-toe
so we discard all subsequent sightings of it.
If the game is finished and has not yet been seen, the statistics are updated.

The full algorithm is available on
GitHub and you can run
it live in your browser by clicking on this link:
stats.html, which takes a few minutes to run.

That page records the outcome of the game from the perspective of the first move.
"cornerXWins5" shows how many times X won games that were only 5 moves long by starting in a corner.
As part of updating statistics, the results are grouped into games which are:

"stupid" - the player has missed winning in the current move
or missed stopping the opponent from winning in the next move; grouped under "stupid stats"

won with a "fork" - the winning player had more than one position to move on the board in order to win, when they won; grouped under "fork stats"

First of all, notice how there are 255,168 unique games (shown near the bottom of the page after computation is complete).
This correlates with other results.

The following table is copied from that page and consolidates these raw results to show how often the
starting player can win when opening in the relevant location.

14652

14232

14652

14232

15648

14232

14652

14232

14652

For example, the first player wins in 14652 of the 255168 unique games if they start in the top left.
This number comes from adding the number of corner wins after 5, 7 and 9 moves (X doesn't win after 6 or 8 moves, rather O does),
for each group of game (0+992+1344 from "stats", 0+1984+0 from "fork stats" and 720+19776+33792 from "stupid stats"),
and importantly dividing by four, because there are four corners - they are shared out over the four
corners of the table above. Results for the edges are also divided by
four, but since there is only one centre cell the result is left undivided.

This table demonstrates that the best location to start is in fact the centre of the board, because starting there wins the most amount of games.
This is confirmed by Azahar Machwes blog article
Tic Tac Toe: Statistical Analysis with arbitrary Grid Size
(albeit for different reasons than he gives) which states:

For the first movers the best starting square by far is 5 (right in the middle and opens up 4 winning lines – most of any starting square, all other squares open up to 3!). Second best option then becomes the corners (1, 3, 7, 9).

A similar result is shown when considering how often the first player loses after they start in one of the following locations
(this is done by consolidating results of games that end after 6 or 8 moves, i.e. when the second player wins, beating the first player):

7896

10176

7896

10176

5616

10176

7896

10176

7896

For example, the first player loses in 10176 of the 255168 unique games when they start in the middle of the top row.
This also demonstrates that the opening move for the first player, which leads to the least number of games lost, is also the centre.
Finally we can consider how many games are drawn when the first player opens in the relevant cell:

5184

5184

5184

5184

4608

5184

5184

5184

5184

The first player draws least when opening in the centre.

As such, the centre is the best position for the first player to start, as it leads to a)
the most number of games won and b) the least number of draws and c) the least number of games lost.

But because we are considering every possible game, this statement does depend upon the players being random in their moves and not necessarily moving so that they win as soon as possible
or block an opponent from winning on their next turn.
For that reason, the results have been grouped so that we can also analyse what happens when a more clever player
plays. Let us disregard all "stupid" games where a move to win instantly is missed or a move to block the opponent winning next is missed.
We are now considering a more
"perfect player".
They are only "more" perfect, rather than exactly "perfect", because the algorithm used to ignore stupid games works as follows.
First it checks to see if the player is missing a chance to win immediately. If they are, they count as stupid.
Second, it checks to see if the opponent could win in their next move, and if they can, and the current player does not
block them, they count as stupid too. Now consider the game "1234567", as it has some special properties.
After "12345" the board looks like this:

X

O

X

O

X

O, the second player, needs to move to position 7 or 9 in order to attempt to block the fork created when X moved to position 5.
Of course any move which O makes is futile, and as such, the move to 6 which O is about to make does not count as stupid, because
there is nothing they can do at this stage to avoid losing. Perhaps Os move to position 4 should have counted as stupid, because
it didn't block X from getting a fork in their next move. The algorithm doesn't go that far, and as such this game is marked as
being non-stupid and won with a fork by X in 7 moves. Since it isn't as good as it could be, I describe the players as more
perfect rather than perfect.

So, when considering these more perfect players, the number of games won by the first player moving to the relevant cell then becomes:

1080

1240

1080

1240

480

1240

1080

1240

1080

The centre no longer leads to the most number of games won, rather starting on an edge improves the chances of a victory.
The number of games lost by the first player moving to the relevant cell is:

304

608

304

608

128

608

304

608

304

The centre is still the starting position which results in the least number of losses for the first player. Although the edge has become the
starting position which results in the most number of wins (see above),
it is also the starting position resulting in the most number of games lost.
Draw statistics for these more perfect players look like this:

1668

2144

1668

2144

816

2144

1668

2144

1668

Starting in the centre still results in the least number of draws.
The results of the last three tables are a little conflicting - is it best to start in the centre and avoid a draw or
start on an edge which leads to most victories but also to most losses? One way of answering this question is to
consolidate the three tables by combining the results using a "reward function" - a formula to sum the individual
results which can be weighted if required.

Imagine writing an artificial intelligence algorithm which had to determine the best cell to move to. It might be
trained to favour a win over a draw and a draw over a loss. The reward function could be:

reward = (100*oddsOfWinning) + (10*oddsOfDrawing) + (-1*oddsOfLosing)

The table of rewards for perfect play, using the above weightings, is:

1,509,144

1,464,864

1,509,144

1,464,864

1,605,264

1,464,864

1,509,144

1,464,864

1,509,144

The result in this case is that a first move to the centre is still the best.
Even if we change the weightings, it isn't possible to tune the reward function so that the corner becomes
the best opening move.

Maybe it depends on how "best" is defined. The popular webcomic xkcd posted a
complete map of optimal tic-tac-toe moves, perhaps in 2010,
which is apparently based on the work titled "Fractal Images of Formal Systems", by Paul St. Denis and Patrick Grim,
which shows a more complete map, because
it does not show just the top left corner where X starts in the top left. For some reason, only the top
left corner of their image was reproduced by xkcd. A few people have asked why this is on the xkcd forum and the best
explanation I could find
is from someone using the name Algrokoz, who states:

Opening in the corner is optimal because of the number of chances it gives O to not play
optimally, not because of total potential win scenarios.

That means a corner opening gives the opponent less places to go in order to play optimally, whatever "optimally" means.
Assuming that the best counter to an opening in a corner is the centre, and assuming that the best counter to an opening in
the centre is a corner, then yes, one could argue that there are four corners and just one centre, so it is better to
start in a corner, because a random opponent has a one in eight chance of finding the best counter move. Compare this
to opening in the centre where a random opponent has a four in eight chance of finding the best counter move.
We will examine this logic in a bit...

If played "rationally"
by both sides, the game must end in a draw. The only
chance of winning is to catch an unwary opponent in a
"trap" where a row can be scored on the next move in two
ways, only one of which can be blocked.
Of the three possible opening plays- a corner, the center
or a side box- the strongest opening is the corner, because
the opponent can avoid being trapped at the next move only
by one of the eight possible choices: the center. Conversely,
center opening traps can be blocked only by seizing a corner.

The "traps" which he is talking about are what Wikipedia and I refer to as forks - plays which result in
a winning opportunity in more than one cell, forcing the opponent to lose.
In his statement Gardner is saying that the only place the opponent can avoid a fork, after an opening in
a corner, is the centre. If we analyse just games that end in a fork, and search for
what happens after two moves we find that when X starts in a corner and O takes the centre, there are
still 24 games that X can win with a fork, so I am not sure that Gardners statement is correct.
However of all the other second moves, countering with the centre results in the least amount of possible
forks later in the game (this is easy to calculate by grouping game results by their two opening moves,
see the "second move analysis" shown on the statistics page after the computation completes).

His logic however does seem to be similar to that of the xkcd comment. Interestingly, if we consider what
happens when X starts in the centre and O takes a corner, there are also 24 forks by which X can win the game,
and this too is the best counter move from the point of view of forks.

If we consider only games won by a fork
(see the "fork stats" section on the page which calculates them), then we find that starting in a corner
results in 496 (1984/4) forks, compared to only 320 when starting in the centre.
Finally we have a situation where it is indeed better to start in the corner.

But is it correct to just consider games that end in a fork?
I don't believe it is possible to force a player into a situation where they will be confronted with a fork.
Think about the game presented earlier, "1234567", where a fork could have been avoided by O moving to the
centre (5) instead of 4.
If a player is clever enough to avoid a fork then the game will certainly end in a draw. And so we are talking about
players who know how to avoid losing from the start. There is no "best" place to go in order to avoid a draw, when
perfect players are playing - Wikipedia and my own analysis shows that when perfect players play, the only possible result is a draw.
An optimal opening position can surely only exist when playing a player who might make a mistake or a move
which is less than ideal. In that case it makes no sense to analyse only games which end in a fork and we
end up having to also consider games that end otherwise. As we have already seen, a corner opening is not the best.

Let's return quickly to the argument about the number of places the second player can move to.
Maybe Gardners argument is the same, that in total, because there are four corners and just one centre, there are
more moves available to the opponent to avoid a fork, when opening in the centre. But in the same quote he
has reduced the number of opening moves to just three, because of board symmetries, so such a conclusion
wouldn't be logical.

The problem with Gardners statement (and the graphic published by xkcd) is that the Wikipedia tic-tac-toe page
uses them to make the following claim:

The first player, who shall be designated "X", has 3 possible positions to mark during the first turn...
Player X can win or force a draw from any of these starting marks; however,
playing the corner gives the opponent the smallest choice of squares which must be played to avoid losing[16].
This makes the corner the best opening move for X, when the opponent is not a perfect player.

Reference 16 is the Gardner book that I reference above.
The last sentence in the Wikipedia quote is problematic because of the phrase "when the opponent is not a perfect player".
As argued above, if players are not perfect, we need to consider at least games ending in a fork and otherwise, but probably
also some games which involve stupid moves. As shown in this article, an opening in the centre is better than one in a corner.

Why is all of this so important? I've been working on training a
machine learning algorithm to play tic-tac-toe
and I was surprised how it learned to open in the centre rather than a corner. I assumed Wikipedia was correct...
As a result of this study I shall now endevour to get the Wikipedia page corrected, before a generation of children
wrongly learn that opening in a corner is the best move!