I recently took a one week vacation and went to my parent's village. My niece Theodora was there (she is 7 years old), spending a fortnight with her grandparents. Naturally, in my duties as her uncle, I read her fairy tales when she went to sleep; gave her my phone so she could snap photos and play mobile games; and when she got tired of running around in the garden, we played two-player board and card games.

She particularly enjoyed playing Score4, a tic-tac-toe where you drop chips from the top: the goal is to align a series of 4 of the same color (horizontally, vertically or diagonally) to win. The game is also known as Connect Four in other countries.

Since I always try to find ways to "lure" my nephews and nieces towards science and engineering, I saw an opportunity here: after a number of Score4 rounds (seeing her brain adapt and learn patterns was quite a sight in itself), I told Theodora that by being an engineer, her uncle could create a "magical" program on her laptop: one that would play Score4 so well, that it would beat her, me, and every other human she knows.

She smiled and said "I'd like to see that, uncle!"... and that's when this story started.

My Score4 Windows implementation (executed under Wine)

Minimax

One of the AI algorithms extensively used in two-player games is Minimax. The Wikipedia article has all the information one needs, but let's review the main idea:

The Minimax algorithm

A "scoring function" is used: This function, given an input board, produces a number. The more positive the number, the better the board is in terms of player A; the more negative the number, the better the board is in terms of player B.

With (1) the scoring function, and (2) a way to "find the allowed moves for a particular board", one can create a tree like the one shown above. Each node represents a board state, with the root node representing the current state of the board. From the root, we recurse downwards, creating the possible boards that can happen by each of the allowed moves, and stop when we reach a specified depth.

When we reach the maximum depth level, we apply the scoring function. This creates scores for all the "leaf" nodes of the tree. We then apply a simple strategy:

if the level corresponds to a move of the B player, we "distill" the minimum of the children scores to their parent (since the B player wants as negative values as possible)

if the level corresponds to a move of the A player, we "distill" the maximum of the children scores to their parent (since the A player wants as positive values as possible)

This process creates the scores you see in the diagram above. When the recursion has calculated all the scores at depth 1 - right below the root - then the final decision is taken: the move is chosen that leads to the child with the optimum score.

First attempt: Coding Minimax in functional languages (OCaml/F#)

Functional languages (like LISP, OCaml, Haskell, etc) are reputed to allow concise, expressive solutions to artificial intelligence problems. Having little to no experience with these languages, I decided to verify this claim on my own little experiment, using OCaml and F# on Score4 and Minimax. In "Phase 2" below, I moved on to imperative constructs and languages, and the differences are indeed striking.

But let's take this one step at a time.

As we saw in the previous section, to implement Minimax we need to be able to represent the board:

type mycell =|Orange|Yellow|Barrentype boardState = mycell arrayarray

Each cell of the board can be empty (Barren), or carry one of the two colors. The board itself is a two-dimensional array of cells.

The problem has some parameters - the size of the board, the depth we will descend into, as well as the two "magic" return values of the scoring function, which indicate one of the two players has won:

Assuming that we have a scoreBoard function that returns a score for our board, and a dropDisk function that creates a new board from an existing one (by dropping a chip on a specified column), this is the "heart" of my Score4 minimax algorithm, in functional-style OCaml:

First, we check to see what depth we are in. In this implementation, minimax's depth parameter starts from maxDepth, and is decreased at each recursive call. This means that when the value is 0, we are at the leaf level (see diagram above) - so we invoke the scoreBoard function, passing our input board to it. The function returns an integer value, which we return inside a tuple: (None, score).

Why a tuple, you ask? Simple: minimax will not only find the optimal score - it also needs to find the optimal move, the move that attains that score. The first member of the returned tuple will therefore be the move itself, followed by the score attained by the move.

You might ask: Why do we return None, then, in the place for the move? Well, at depth 0, we don't know what move lead us here (i.e. which column we placed the chip that lead to this board) - it is the parent minimax call that knows. We will see how we handle this below - keep reading.

In Score4, you can drop a chip in any column, as long as that specific column is not full. The findValidMoves function, therefore, feeds the list of integers from 0 to width-1, to a simple filter: if the top-most chip in that column is empty (Barren) then the number passes.

This means that validMoves is a list of integers: the columns whose top cell is empty.

What were these "--" and "|>"? Well, OCaml allows creation of infix operators (by placing the operator name within parentheses), and I used "--" to emulate the ".." operator that other languages have: The construct N--M generates a list of numbers, starting with number N and ending on number M. In the same vein (i.e. syntactic sugar), "|>" is the "piping" operator:

You see now the resemblance with UNIX shell pipes, in the validMoves calculation? We piped a list of 0 .. (width-1) to List.filter, and some of them "survived". Below you'll see lengthier pipes, but the premise is again the same: we pipe stuff from one "function layer" to the next. Infix operators allow us to create "chains" of processing logic, which can be thought of as factory assembly lines.

Once we have the list of valid moves, we check to see if it is empty. If there are no valid moves, we just return the score of our current board, in a (None,score) tuple:

match validMoves with|[]->(None,scoreBoard board)| _ ->...

If there are valid moves, then we create a list of the valid boards that are instantiated from our valid moves: We pipe the validMoves list to List.map, and for each valid move, List.map creates a tuple. The first element of the tuple is the move itself (the integer pointing to the column). The second element of the tuple is the new board that is created when we drop a chip on that column, via the dropDisk function:

We now check if any of these boards are winning/losing boards. Depending on the level we are, we are either maximizing or minimizing the score (i.e. we try to find the optimal move for the Orange OR for the Yellow player), so targetScore is made to point to the "magic" value that, when returned from scoreBoard, declares a victory. We then "filter" for that target score:

This is the key line in the function: it recursively descends into depth-1, toggling the color (via function otherColor), toggling the "target mode" from A player to B player and vice versa (i.e. toggling maximizeOrMinimize), and returning a tuple, containing the winning move, and its score. We pipe to List.map snd, and are therefore ignoring the returned moves - we just keep the scores of our "children" nodes in an output list.

Using myzip (which is a function that, just like F#'s List.zip, takes 2 lists as inputs, and creates an output with a single list of 2-tuples), we "pack" all our results in allData: a list of 2-tuples, of the form: (move,score)

Skipping over the debug output, there is only one thing remaining: to sort the results, based on their score, and take the largest or the smallest one, depending on which player we are optimizing for (maximizeOrMinimize).

Update: Reddit and Hacker News people pointed out that we don't really need to sort - we just need to find the largest/smallest value, so List.fold_left is perfect for the job (and tail-recursive). The benchmarks show no improvement in execution speed for either OCaml or F# with this change, probably because the lists are too short - but regardless, this is indeed the correct way to find the best value:

That's all. Notice that the code reasons about boards, moves, and scores - it is completely generic, and applies to any two-player game that we can code a scoring function for.

Speaking of the scoreBoard function, I tried various forms to evaluate the board. I ended up on a simple policy: measuring how many chips of the same color exist, in spans of 4 going in any direction. I do this over each of the board's cells, and then aggregate this in a table keeping the aggregates from -4 to 4:

-4 means that the cell is a part of 4 cells that contain 4 yellow chips

-3 means that the cell is a part of 4 cells that contain 3 yellow chips

...

3 means that the cell is a part of 4 cells that contain 3 orange chips

4 means that the cell is a part of 4 cells that contain 4 orange chips

If 4 is found, the board is a win for the Orange player, and the function returns orangeWins (i.e. 1000000). If -4 is found, the board is a win for the Yellow player, and the function returns yellowWins (i.e. -1000000). Otherwise, scaling factors are applied, so that the more "3"-cells found, the more positive the board's score. Correspondingly, the more "-3" found, the more negative the board's score.

Let me reiterate again, that all the code above, is NOT problem-specific! If you define your board type, your findValidMoves and your performMoves functions, this code will play whatever game you want. Functional languages offer impressive code abstraction.

Testing with a "driver" program

Let's test our implementation - and since we intend to use multiple languages to do Minimax and eventually compare them, a simple Python program is written, which spawns the "engine", and gets back the optimal move. The board's state is passed through the command-line, e.g. ...

bash$ engine o53 y52

...means that the input board has an orange chip in cell (5,3), and a yellow one in (5,2).

The engine returns...

3
bash$ ...

...meaning that the engine chose to play on column 3. The Python "driver" program makes use of this simple command line interface, and offers this "graphical" console:

Wow... Even though both codes are essentially the same, and we compiled both using optimization options,
the native binary of OCaml plays a move almost five times faster than F#...

Native compilers have "unfair" advantages over VMs (in this case, .NET). Then again, F# is a relatively new language; it's compiler will undoubtedly improve over time (Or maybe I am missing some optimization option - any feedback on this most welcome). Let's see what the more mature C# compiler can do on the same platform...

Moving to imperative style (C#/OCaml/F# and eventually, C++)

Switching to C#, we also rewrite the algorithm in imperative style. Note that C#'s two-dimensional arrays are very slow; we instead use "jagged" arrays, that is, arrays containing arrays - just as we did for OCaml and F#.

We use two "out" parameters to return the results (instead of a tuple).

We no longer use lists everywhere - we use loops instead.(Functional language tutorials tend to favor
lists - loops are somewhat "tainted" by imperative thinking. Still, OCaml does offer us for loops - no break, though)

We mutate the heck out of everything, altering the state wherever we see fit to do so.

For example, why have dropDisk return a new board? Why not just have it return the row where the
chip fell on, modify (in-place) the board, call minimax, and after it returns, "undo" the damage and reset that cell to Barren?

In the same vein, why store results in lists, and sort them? Why not just keep the best combo (move,score)
as we go through the for loop in a couple of mutable variables (references)?

And so we hack and slash - the beautiful abstract functional code is translated to a problem-specific mutant...

Significantly faster than F# - still, half as fast as OCaml. Let's go back to F# and OCaml, and perform the same... mutation on their code (i.e. write it in a state-altering-mayhem way, since both these languages allow imperative style coding as well):

Both F# and OCaml improved their times by using imperative constructs (using for loops and mutable variables): 24% for F#, 18% for OCaml.

I still feel that bitter taste in my mouth, though - there were speed gains, undoubtedly, but were they worth it?

The high-powered plasma cannon: C++

Well, since we did the dirty deed and wrote the algorithm imperatively, we might as well see how C++ fares.
Translation from C# to C++ is almost trivial, most things work as-is... and the results are...

What did I learn from this exercise? Well, a lot, considering I had (and still have) very little experience with functional-style coding.

Functional-style programming reasons in terms of higher level constructs: lists of moves, lists of boards, passing around evaluation and move-making functions, etc. Imperative code feels "dirtier", since it reasons in terms of lower-level stuff (for loops on arrays, that usually lead to state mutation...) The results in the functional-style seem more abstract, and easier to reuse. For example, the minimax of the functional implementation can be used as-is for any two-player game.

There's no such thing as a free lunch, however - imperative code is usually a lot faster. The sweet spot in terms of this balancing act (clear functional code vs speedy execution) - at least in this experiment - was OCaml, a functional language generating native binaries, whose speed resided halfway between VM-based F#/C# and native C++.

If you do need speed, try utilizing imperative style (state-changing) only in your "core" logic. In this example, scoreBoard (called hundreds of thousands of times) was written imperatively.

Switching the code to imperative style offers speed advantages, but it causes detrimental effects on quality and reusability. In this particular example, writing minimax imperatively only made sense - i.e. speed gain was significant enough to warrant the code impacts - when C++ was used.

Separating the presentation logic always helps (easy re-structuring and new features - in this case, a GUI).

The inclusion of functional constructs (lambdas, etc) and libraries like Boost in C++0x make me eager to see some expert feedback from C++ gurus, who could write the code in a functional manner in C++. Unfortunately, every time I look at Boost my brain bleeds - any help (especially working code) most appreciated.

Feel free to port the code to your favorite imperative/functional language and send me tarballs or git-format-patch outputs: I will add/commit them so we can see how LISP/Scala/Clojure/etc fare on this. The code lives on Github, so you are invited to review and fix any things I did wrong. I've been programming imperatively for decades, but I am a "functional newbie" - so I am hoping people will show me ways to keep the functional way of thinking and still improve the speed under F# and OCaml.

Enjoy!

Update, July 12: Ports are coming in from all over the Web... the repository now carries versions for:

Java (Sario O. Alvey)

Python and D (Reddit/leonardo_m - who also suggested that C++ does not need a 'translation stage' in scoreBoard, since enumerants can be added - hence a 30% speedup in C++)

Haskell (Hacker News/phnguyen)

Go (HackerNews/supersillyus)

Update, July 22: Thomas and Daniel from StackOverflow helped improve F# speed to within 30% of C#. Thanks, guys!

The Windows .NET compilers and runtimes are running the final, optimized code, a lot faster than Mono does.

Memoization behaves worse under Cygwin's GCC than the normal algorithm - weird, but consistently reproducible in my machine. In contrast, under Linux it provides a sizeable 40% speed increase.

The winners are still C++ (for imperative style) and OCaml (for functional style) - but the difference is a lot smaller now between them and the others.

Update, November 6: The recent passing of John McCarthy reminded me of Lisp... so I decided to port score4 to Lisp as well. The tremendous power of Lisp macros allowed me to unroll (at compile-time!) the computations done in scoreBoard and only emit the actual accumulating instructions. This made Lisp the 2nd fastest language, behind only C/C++:

I have no words - I am simply amazed with what Lisp allowed me to do (it deserves a blog post on its own). This kind of functionality (complex, nested loop unrolling) is something that is either implemented in your language's compiler, or forces you to manually mess up your code (or code generate). Lisp allowed me to do this at compile-time, maintaining at the same time the original (slower) code structure. Mind=blown.

Here's how the old (commented) and the new code look, for one of the 4 macros: