Artificial Intelligence Starting From a Blank Slate

Watch the seminar about the main principles used in AlphaGoZero, developed by Google DeepMind and recently covered in media. We also present research at Chalmers that uses reinforcement learning.

​ A grand challenge for artificial intelligence is to develop an algorithm that learns complex concepts from a blank slate and with superhuman proficiency. To beat world-champion human players at the classic strategy game Go, Google DeepMInd developed a program AlphaGo, trained through a combination of supervised learning based on millions of human expert moves and reinforcement learning from self-play. This program defeated Go champion Lee Sedol in a tournament in March 2016.

A new version of the AlphaGo computer program, called AlphaGo Zero, was able to teach itself to rapidly master Go, starting from a blank slate and without human input, as reported in a new paper published in Nature on 19 October. It learns solely from the games that it plays against itself, starting from random moves, with only the board and pieces as inputs and without human data. And defeated its predecessor by 100 games to 0 after training for only 36 h.

AlphaGo Zero uses a single neural network, which is trained to predict the program’s own move selection and the winner of its games, improving with each iteration of self-play. As the program trained, it independently discovered some of the same game principles that took humans thousands of years to conceptualize and also developed novel strategies that provide new insights into this ancient game.The new program uses a single machine and 4 TPUs while the previous version of AlphaGo was trained over several months and required multiple machines and 48 TPUs (specialized chips for neural network training).

We will discuss the main principles and new extensions of reinforcement used in this work and at the end we will briefly discuss research at Chalmers that uses reinforcement learning in autonomous driving and natural language technology research.