Writing an Othello program

Prerequisites

In order to put together an Othello program of moderate strength,
some programming experience is necessary.
Most of the algorithms and data structures used can be found
in textbooks on artificial intelligence, textbooks on algorithms
and on the web.
A good high school student or undergraduate student of computer
science should be able to grasp the algorithms well enough to
create a strong program.

Some of the more advanced techniques referred to below require some familiarity
with optimization theory and linear regression; this material
is on the level of undergraduate courses in applied mathematics.

The real difficulty with creating a strong Othello program
is the debugging.
Due to the nature of the search algorithms, bugs can lurk for quite
a long time before surfacing (taking the programmer by surprise).
The only advice I have here is to test all new modules thoroughly.

Searching

Basics

All the strong programs of today determine the best move in a
position by searching many moves ahead: "If I make that move
and he makes that move and I make that move etc., how good is
the position for me?"
Obviously, the number of positions arising even after only
a few moves grows very fast; in most midgame positions,
each players have about 10 moves each, so to look ahead
only 9 moves (a search to depth 9 in programmer lingo)
results in a billion (10^9) positions to investigate.
Fortunately, it can be proved (mathematically) that a program
need not consider all these positions in order to find the best move.
Instead a procedure called alpha-beta pruning is used
(a good description by Andy Walker can be found
here
).
It greatly reduces the number of positions which need be examined.
Variations of this algorithm are used in all strong Othello programs
(as well as programs for other board games such as chess and checkers).
A good Othello program would only have to look at 100'000-1'000'000
positions in order to look 9 moves ahead instead of a
billion without alpha-beta.
There are several variations of alpha-beta, the most widespread being
nega-scout (discovered by Alexander Reinefeld),
but they do not differ fundamentally from plain alpha-beta pruning.

Move ordering

For alpha-beta pruning, it is essential that the best moves in
a position are searched first.
This can be accomplished using several heuristics.
One is to store "killer responses" to each move - if my opponent
plays the X-square g2, I definitely should look at h1 first to see if the
corner can be captured, and if this is a wise thing to do.
Another useful heuristic is to perform shallow searches;
before starting on a depth 12 search, the program searches the
position to depth 2 and finds the move which looks best after
the shallow search, which obviously takes negligible time
compared to the full search.

Selective search

The strongest Othello programs of today
all use some form of selective search mechanism.
The idea is very simple and resembles the thinking of human players:
In most positions, many moves are obviously bad and need only be
looked at long enough to determine that they are bad.
This is usually implemented as follows:
When searching a position to depth 12 (i.e., 12 moves ahead),
the program starts by searching all moves available to depth 4
and discards the moves which seemed really bad when searched
to depth 4.
The remaining moves are searched to depth 12.
The concept of "really bad" is formalized by generating lots
of statistics of how well a search to depth 4 predicts the
result of a search to depth 12.
This procedure is called Multi Prob-Cut (invented by
Michael Buro) and the example above uses the "cut pair" 4/12.
Obviously other cut pairs can be used, and most programs use many cut pairs.

What search algorithm to use

Alpha-beta pruning can be used in any game-playing program,
also programs with very little knowledge benefit greatly from
this algorithm.
Selective search algorithms, such as Multi Prob-Cut (MPC), only work
well when the program has a good position evaluation.
The reason for this is that the selectiveness introduces errors
which can prove fatal.
It also depends on certain other properties of the way positions
are evaluated as evaluations of positions in different game stages
are compared.
For the best programs, MPC gives a real boost to the search depth
- few programs get beyond depth 15-16 without MPC, but with MPC,
depth 20-27 is reachable.

Transposition tables

Often a position can occur after several move sequences,
an example is the Tiger opening which can begin with
f5-d6-c3-d3-c4 or f5-d6-c4-d3-c3, both sequences resulting in
the same position.
This is called a transposition.
To avoid searching a position several times, the program
stores all (or at least most) positions it has encountered
during its search in a transposition table (usually implemented
as the data structure hash table).
In the midgame phase of Othello, this reduces the time
for a deep search by 20-50% (estimates).
In the endgame, where transpositions are much more common,
the gain can be much higher.

Position evaluation

This is by far the most important part of the program,
if this module is weak it matters little if the program
has great search algorithms.
I will describe three different paradigms for creating
evaluation functions.
They describe the evolution of Zebra's knowledge and I
believe that most Othello programs can be placed in one
of these categories.

Disk-square tables

The idea behind this type of evaluation function is that
different squares have different values - corners are
good and the squares next to corners are bad.
Disregarding symmetries, there are 10 different positions
on a board, and each of these is given a value for each
of the three possibilities: black disk, white disk and empty.
A more sophisticated approach is to have different values for
each position during the different stages of the game; e.g.
corners are more important in the opening and early midgame than
in the endgame.

Programs with this type of evaluation are invariably weak
(I have seen no exception to this rule).
On the other hand, it is very easy to program this evaluation
function, so many programmers start using this approach.
And for most programmers, it suffices to make the program
strong enough to beat its creator...

Mobility-based evaluation

This more sophisticated approach substitutes the extremely local
view of the board used in the disk-square tables by a more
global view of the board.
The key observation is that most human players strive to maximize
mobility (number of moves available) and minimize frontier disks
(disks adjacent to empty squares).
These measures, or at least good approximations, can be found
very quickly if coded efficiently, and they increase playing strength
a lot.

Most programs with mobility-based evaluation functions also
have some knowledge of edge and corner configurations and try to minimize
the number of disks during the early midgame, another strategy
used by human players.

Pattern-based evaluation

As mentioned above, programs of moderate strength often incorporate
some knowledge about edge and corner configurations.
Mobility maximization and frontier minimization are global features,
but they can be broken down into local configurations which can be
added together.
The same goes for disk minimization.

This leads to the following generalization:
Only use local configurations (patterns) in the evaluation function.
The usual implementation is to evaluate each row, column, diagonal
and corner configuration separately and add together the values.
For this to work out well, lots of different patterns have to
be evaluated - there are 3321 different edge configurations alone,
and when all the configurations are counted, there are tens
of thousands.
To make bad things worse, the relative values of the different
configurations are highly game-stage dependent.
Clearly, the process of determining values for all configurations
must be done automatically.
This is done by taking a large database of games played between
strong players and calculating statistics for each configuration
in each game stage from all the games.
Exactly how this is done varies from program to program and is
beyond the scope of this document anyway - but see the papers
written by Michael Buro to which there is a link at the
bottom of this page.

In the process of determining the relative values of the
different configurations, one must decide what the evaluation
function should try to predict.
The most common choice to predict the final disc difference.
I believe that Hannibal used a weighted disk difference
measure where the winning side gets a bonus corresponding
to a number of disks.

Opening knowledge

All strong programs use opening books and update their
books automatically after each game.
Most of the top programs have opening books which to some extent
are based on IOS/GGS games, so the books have a large overlap.
What differs is how the information from the games is
used.
An approach, used by all top programs, is to go through all positions from all
games in the game database and determine the best move not
played in any database game.
This is time-consuming as a deep search must be performed for each
position, but once this is done, updating the book on the fly
is easy - after each game played, all new positions (and maybe some
old) are searched for the best deviation.
Using the resulting set of positions and moves (both deviations
and moves actually played) the best book lines for both sides can
be determined using a simple minimax search.

Some other programs don't store deviations
from each position but instead calculate deviations on the fly.
The main advantage of this approach is that there is no need to
calculate the deviations (which is very time-consuming).
There are several drawbacks though; this kind of opening book
is much more vulnerable to weak moves in the game database which can
lead to dead lost positions straight out of book.

Endgame

As the number of moves available to each player decreases towards
the end of the game, the programs can search much deeper in
the endgame phase of the game.
This enables them to play the endgame much better than any human;
watching computers play the endgame often feels quite weird as
both opponents know the outcome of the game with more than 20 moves
left!
For computer programs, the endgame starts in what most humans
would refer to the late midgame; with 20-30 moves left of the game.

The search algorithms used in the midgame work well in the endgame
as well (including a variation of MPC).
The objective changes though: For a program, the endgame starts
when it can calculate who is winning, i.e., follow all variations
until no player has any move left.
This is usually with 24-30 moves left for the top programs.
When the program knows who is winning, it starts to optimize the
score (win by as much as possible).
This usually is much harder than determining who is winning
(unless the position is a draw) and can usually be accomplished
with 23-28 moves left.

Good endgame performance is all about move ordering, and it pays
off to use move ordering heuristics different from those used
in the midgame.
The reason is as follows: Most lines analyzed contain one or more
blunders, and it when finding a refutation of a bad move it
suffices to find any refutation, not necessarily the best
refutation. As we want the program to solve a position as fast as
possible, it makes sense to look for the fastest refutation
(measured in thinking time). This is accomplished by searching moves
after which the opponent has low mobility earlier than they would
be searched in the midgame.
This is commonly referred to as the Fastest-First heuristic.

Some source code

Basic endgame solver

When I started working on computer Othello I downloaded an endgame solver
created by Warren Smith and later improved by Jean-Christophe Weill.
I then improved it myself, the resulting solver is found
here.
A more sophisticated endgame solver written by Richard Delorme
can be found
here.

Bitboard trickery

Zebra's endgame solver is very fast,
maybe the fastest around. One reason for this is that it represents the
Othello board as four 32-bit words, two words for the black discs and
two words for the white discs.
Using this representation it is possible to write a very fast move generation
functions.
The fastest-first heuristics used for move ordering benefits from this as well,
here
you find a C+assembly function that computes
the number of legal moves for one player in less than 200 clock cycles on
an AMD Athlon - I have not timed it on other platforms, on a Pentium III I
expect the cycle count to be roughly the same, whereas on a Pentium IV my guess
is that it needs 300-400 cycles.

The code is used as follows: Call init_mmx() to initialize some
constants (you only have to call this function once), then
use bitboard_mobility() to get the mobility.
The code can be compiled as C code using GCC's asm extension.
It is straightforward but boring to convert it into code that can be
compiled using Microsoft Visual C++.

The main trick used by the code is to find all moves that flip discs in
a certain direction, e.g. up, in parallel.
This can be done in several different ways, this code uses a variation
first suggested by Richard Delorme which I have improved somewhat.
There are 8 possible flip directions, 6 of these are managed by the MMX
ALUs and the remaining 2 by the integer ALUs.

Currently the code averages about 2 instructions per clock cycle on an
AMD Athlon - 400 instructions in 200 cycles.
Using the processor optimally it should be possible to execute 3
instructions per clock cycle.
For this application it may be impossible to attain the theoretical optimum,
but it almost certainly is possible to improve on my code by considering
pairing and register stalls throughout - I am a novice assembly programmer and
have probably introduced lots of unnecessary stalls.
It is possible to make it slightly faster by using the psadbw
instruction which is available on Pentium III, Pentium IV, and Athlon.

Since I wrote the above texts, Toshihiko Okuhara has contributed
even faster bitboard code.
Download the Zebra source code if you are interested in it.

Jay Scott maintains this large web site covering all aspects of
game programming.
Contains many different games, not only board games but also
robotic soccer and adventure game AI.
Many techniques are reviewed and there are links to other resources
on the Internet.
Very good site.