This More Powerful Version of AlphaGo Learns On Its Own

Share

This More Powerful Version of AlphaGo Learns On Its Own

Noah Sheldon for WIRED

At one point during his historic defeat to the software AlphaGo last year, world champion Go player Lee Sedol abruptly left the room. The bot had played a move that confounded established theories of the board game, in a moment that came to epitomize the mystery and mastery of AlphaGo.

A new and much more powerful version of the program called AlphaGo Zero unveiled Wednesday is even more capable of surprises. In tests, it trounced the version that defeated Lee by 100 games to nothing, and has begun to generate its own new ideas for the more than 2,000-year-old game.

AlphaGo Zero showcases an approach to teaching machines new tricks that makes them less reliant on humans. It could also help AlphaGo’s creator, the London-based DeepMind research lab that is part of Alphabet, to pay its way. In a filing this month, DeepMind said it lost £96 million last year.

DeepMind CEO Demis Hassabis said in a press briefing Monday that the guts of AlphaGo Zero should be adaptable to scientific problems such as drug discovery, or understanding protein folding. They too involve navigating a mathematical ocean of many possible combinations of a set of basic elements.

Despite its historic win for machines last year, the original version of AlphaGo stood on the shoulders of many, uncredited, humans. The software “learned” about Go by ingesting data from 160,000 amateur games taken from an online Go community. After that initial boost, AlphaGo honed itself to be superhuman by playing millions more games against itself.

AlphaGo Zero is so-named because it doesn’t need human knowledge to get started, relying solely on that self-play mechanism. The software initially makes moves at random. But it is programmed to know when it has won or lost a game, and to adjust its play to favor moves that lead to victories. A paper published in the journal Nature Thursday describes how 29 million games of self-play made AlphaGo Zero into the most powerful Go player on the planet.

“We’ve removed the constraint of human knowledge,” said David Silver, a leading researcher on the project. It’s a statement that reflects growing interest in creating AI systems that can learn without the crutch of data provided by humans. DeepMind and other leading research groups are working on software that learns from trial-and-error exploration, or even direct competition or combat. That’s seen as a route to faster progress on tough problems where human-curated data is scarce, or nonexistent, such as controlling robots.

AlphaGo Zero is simpler than its predecessors as well as smarter. The original design had two separate learning modules, built with technology known as artificial neural networks. One specialized in evaluating board positions, and the other suggested possible next moves. AlphaGo selected moves to play with input from a third module, a form of search, that simulated how the different options would play out. DeepMind says AlphaGo Zero is a better player because it has a single, more powerful neural network that learns to both evaluate board positions and suggest new moves. It uses a simpler search module to pick its moves.

Martin Müller, a professor at the University of Alberta, calls AlphaGo Zero’s new, simpler design “beautiful.” But he says its continued reliance on searching multiple possible outcomes to choose the best path shows the limitations of existing AI technology. “I think that tells us something about the nature of complex problems,” Müller says. “We can’t just have some function that knows all the answers, you need to reason, and think and look into the future.”

For computers, looking into the future of a board game defined by fixed rules is relatively easy. Engineers have made little progress in having them make sense of messier, everyday scenarios. When taking on a many-faceted challenge such as assembling an Ikea sofa or planning a vacation, humans draw on powers of reasoning and abstraction to plot a path forward that so far elude AI software.

That doesn’t mean DeepMind’s technology can’t do useful things today. Google has already used the company’s algorithms to cut data-center cooling bills. The recent financial filing listed the company’s first revenues, £40 million from services provided to other parts of Alphabet. Hassabis says the ideas in AlphaGo Zero could be applied to work on understanding climate, or proteins in the body. Machine-learning research from Google and others has also shown promise for extracting more ad dollars from consumers.

AlphaGo Zero is also set to give back to the community DeepMind's project has shaken up. New ideas from its predecessors like that jaw-dropping move against Lee Sedol have invigorated the game. Fan Hui, the first professional player beaten by AlphaGo, now works with DeepMind and says AlphaGo Zero can inject further creativity into one of the world’s oldest board games.

“Its games look a lot like human play but it also feels more free, perhaps because it is not limited by our knowledge,” Fan says. He’s already christened one tactic it came up the “zero move,” such is its striking power in the early stages of a game. “We have never seen a move like this, even from AlphaGo," he says.

Related Video

Business

Google's AlphaGo Notches Another Win for AI

Google's AlphaGo artificial intelligence system edged out the best human Go player for a 2-0 win. But it is also playing with and against teams of professional human players.