Thursday, 14 December 2017

Don't panic

The publication of a new paper by the team behind AlphaGo has really got the chess world talking. Applying the AlphaGo learning method to chess, they developed a program that was able to beat Stockfish after 4 hours of self learning. To read the headlines (and comments) about this, it would almost seem that humans are about to be replaced by computers, in all facets of life.
For me, while it was an impressive result, it isn't the end of the world, or even chess. Self learning programs have been around for a while, and were quite strong, even 15 years ago. KnightCap was one such program, with the authors describing the self learning aspects in a paper they published in 1999 (which was cited by the authors of AlphaZero).
On the other hand, what did impress me was the successful implementation of the Monte Carlo Tree Search. This is an alternative to the tried and true Alpha-Beta search method (or its variants), and relies on a probabilistic approach to evaluating position. Instead of assessing the various factors in a position (material, space, pawn structure), the program self-plays thousands of games from a given position, preferring the move that results in the most number of wins. The obvious flaw in this method (apart from computing restraints), is that while a move may lead to wins 99 times out of 100, the opponent may find the 1% reply that is a forced loss for the engine. But based on the result against Stockfish, this did not seem to occur in practice.
The other thing to point out is that this wasn't a match between AlphaZero and Stockfish, at least not in a competitive sense. Stockfish had a number of restrictions placed on it (no opening book, less powerful hardware), and I suspect the point of the exercise was to provide a measure of how successful the learning algorithm was. If the authors intend to develop the worlds strongest chess program, then entering the World Computer Chess Championships is instead the best way to test it.

1 comment:

Anonymous
said...

Actually, if you read the AlphaGoZero paper (the Nature one) carefully, and the current preprint, it seems there is no randomization (or even roll-outs!) in the MCTS they use (except in the learning stage, with "Dirichlet noise"). When playing a game, it's more of "book-building" according to the computer experts, as various GUIs allow. But I'm sure than the NN-based Google implementation is honed to their domain (80K new "MCTS" nodes per second).