AlphaGo Zero Shows Machines Can Become Superhuman Without Any Help

An upgraded version of the game-playing AI teaches itself every trick in the Go book, using a new form of machine learning.

AlphaGo wasn’t the best Go player on the planet for very long. A new version of the masterful AI program has emerged, and it’s a monster. In a head-to-head matchup, AlphaGo Zero defeated the original program by 100 games to none.

What’s really cool is how AlphaGo Zero did it. Whereas the original AlphaGo learned by ingesting data from hundreds of thousands of games played by human experts, AlphaGo Zero, also developed by the Alphabet subsidiary DeepMind, started with nothing but a blank board and the rules of the game. It learned simply by playing millions of games against itself, using what it learned in each game to improve.

The new program represents a step forward in the quest to build machines that are truly intelligent. That’s because machines will need to figure out solutions to difficult problems even when there isn’t a large amount of training data to learn from.

“The most striking thing is we don’t need any human data anymore,” says Demis Hassabis, CEO and cofounder of DeepMind. Hassabis says the techniques used to build AlphaGo Zero are powerful enough to be applied in real-world situations where it’s necessary to explore a vast landscape of possibilities, including drug discovery and materials science. The research behind AlphaGo Zero is published today in the journal Nature.

Remarkably, during this self-teaching process AlphaGo Zero discovered many of the tricks and techniques that human Go players have developed over the past several thousand years. “A few days in, it rediscovers known best plays, and in the final days goes beyond those plays to find something even better,” Hassabis says. “It’s quite cool to see.”

DeepMind, based in London, was acquired by Google in 2014. The company is focused on making big strides in AI using game play, simulation, and machine learning; it has hired hundreds of AI researchers in pursuit of this goal. Developing AlphaGo Zero involved around 15 people and probably millions of dollars’ worth of computing resources, Hassabis says.

Both AlphaGo and AlphaGo Zero use a machine-learning approach known as reinforcement learning (see “10 Breakthrough Technologies 2017: Reinforcement Learning”) as well as deep neural networks. Reinforcement learning is inspired by the way animals seem to learn through experimentation and feedback, and DeepMind has used the technique to achieve superhuman performance in simpler Atari games.

The number of possible configurations on the Go board is greater than the number of atoms in the universe.

Mastering the board game Go was especially significant, however, because the game is so complex and because the best players make their moves so instinctively. The rules of good play, in other words, cannot easily be explained or written in code.

Reinforcement learning also shows promise for automating the programming of machines in many other contexts, including those where it would be impractical to program them by hand. It is already being tested as a way to teach robots to grasp awkward objects, for example, and as a means of conserving energy in data centers by reconfiguring hardware on the fly. In many real-world situations, however, there may not be a large number of examples to learn from, meaning machines will have to learn for themselves. That’s what makes AlphaGo Zero interesting.

“By not using human data or human expertise, we’ve actually removed the constraints of human knowledge,” says David Silver, the lead researcher at DeepMind and a professor at University College London. “It’s able to create knowledge for itself from first principles.”

To achieve Go supremacy, AlphaGo Zero simply played against itself, randomly at first. Like the original, it used a deep neural network and a powerful search algorithm to pick the next move. But in AlphaGo Zero, a single neural network took care of both functions.

Martin Mueller, a professor at the University of Alberta in Canada who has done important work on Go-playing software, is impressed by the design of AlphaGo Zero and says it advances reinforcement learning. “The architecture is simpler, yet more powerful, than previous versions,” he says.

DeepMind is already the darling of the AI industry, and its latest achievement is sure to grab headlines and spark debate about progress toward much more powerful forms of AI.

There are reasons to take the announcement cautiously, though. Pedro Domingos, a professor at the University of Washington, points out that the program still needs to play many millions of games in order to master Go—many more than an expert human player does. This suggests that the intelligence the program employs is fundamentally different somehow.

“It’s a nice illustration of the recent progress in deep learning and reinforcement learning, but I wouldn’t read too much into it as a sign of what computers can learn without human knowledge,” Domingos says. “What would be really impressive would be if AlphaGo beat [legendary South Korean champion] Lee Sedol after playing roughly as many games as he played in his career before becoming a champion. We’re nowhere near that.”

Indeed, both Silver and Hassabis concede that finding ways for machines to learn from much less data will be important in their ongoing quest to master intelligence. This may involve developing novel approaches to let machines transfer what they have learned in one domain to another, or to learn from observing others (both humans and other AIs).

But despite the work still to be done, Hassabis is hopeful that within 10 years AI will play an important role in solving important problems in science, medicine, or other fields. “I hope that these kinds of algorithms, and future versions, will be routinely working with us advancing the frontiers of science and medicine,” he says. “Maybe all kinds of things will have been partly designed and discovered by these kinds of algorithms, working in tandem with very smart humans.”