What does it take for an OpenAI bot to best Dota 2 heroes? 128,000 CPU cores, 256 Nvidia GPUs

And a lot of smart machine-learning coding, of course

OpenAI's video-game-playing bots are getting much better at mastering sci-fi strategy war game Dota 2, seeing off semi pro players with ease in team matchups.

However, they can’t quite master the whole game to beat top professional teams – yet.

Last August, machine-learning software built by the OpenAI lab headquartered in San Francisco managed to best Dendi, a pro Dota 2 player, winning two matches out of three. But the victories were only in one-on-one games – a single bot against a single human – and under very limited circumstances that are not applicable in real competitions.

Fast forward about a year, and now OpenAI's bots can play in the more traditional five-versus-five settings, beating amateurs and semi-pro gamers. The battles were restricted to mirror matches between Necrophos, Sniper, Viper, Crystal Maiden, and Lich, meaning that both teams – human and code – play with the same five heroes.

This is an impressive achievement, considering how complex Dota 2 is. It requires strategic planning and an intuition of when to attack or defend bases. Games are played with different types of characters known as heroes, and each one has their own set of unique abilities.

The different skills, possible number of actions, and the fact that it’s an imperfect information game, makes it more difficult than games such as Chess or Go. OpenAI’s team, nicknamed OpenAI Five, managed to beat human teams over a series of informal matches over the past few months.

It won two out of three games against an amateur team with an MMR score of 4.2k – within the 93rd percentile. And against a semi-pro team with an MMR rating of 5.5k – within the 99th percentile.

Game restrictions

Each OpenAI computer player is represented by a separate 1024-unit long-short-term memory network and is trained via self-play, a popular technique in reinforcement learning. The game is run over 30 frames per second for an average of 45 minutes. Researchers use the Dota 2 bot API to pass information about the game’s state to each bot.

They receive a series of 20,000 mostly floating-point numbers that encode vital information such as the location and health of visible units, giving it access to the same knowledge that human teams can have. But it also allows the bots to calculate the precise range of its attacks, something that human teams can’t do.

The game is constantly changing, and the bot receives an updated input every four frames in the game. “It’s like playing with your eyes closed and opening them every four frames,” Greg Brockman, cofounder and CTO at OpenAI, explained to The Register.

"OpenAI Five is given access to the same information as humans, but instantly sees data like positions, health, and item inventories that humans have to check manually,” the OpenAI boffins added in a blog post today.

So when the bots opens its eyes, it gets to see the whole map at once. This is a pretty big advantage for the machines, compared to the human team who have to move their heroes around the map to see everything.

“Map awareness is a basic skill for humans. Players can scroll about the visible map, have an on-screen minimap summarizing lots of details, have selector keys for various units, making it easy to know the full state of the game. OpenAI Five can see all pieces of information that a human is allowed to see,” Brockman said.

Both teams do have access to all the same information, but the difference is that the bots get to see everything all at once. Without the API, the researchers reckon it would take thousands more GPUs to render the pixels to give it the same experience as the human teams.

A bot can also choose when to make a move. It can do this every four frames, it can play this out on the next frame, the second frame, the third one, or the fourth one. “On average, this means OpenAI Five will see something happen (0 + 1 + 2 + 3) / 4 = 1.5 frames after it happened, and will have an opportunity to respond as soon as the next frame, yielding an average reaction time of 33 milliseconds * 2.5 = 82.5 milliseconds,” Brockman said. That’s faster than the human team.

There are also other restrictions, and abilities like warding, turning invisible or certain units that can cast certain spells. The set of possible heroes – five out of 113 – also makes the game a lot less complicated compared to the shebang.

Smells like team spirit

OpenAI Five bots don't really communicate much with each other. Instead, teamwork is controlled by something called "team spirit;" it's a hyperparameter that can be set from 0 to 1, and adds a weight onto how much each hero should care about its own individual rewards compared to the average reward from the whole team.

But there are some positives. In the 1v1 match last year, researchers had to teach the bot strategies like creep block. The new bots, however, managed to learn this on its own.

“It came up with its own strategies through self-play. It learnt to give up some of its own vulnerable territory in order to take some of its opponent’s territory,” Brooke Chan, a member of technical staff at OpenAI, told El Reg.

The bots are trained with OpenAI’s Proximal Policy Optimization and self-play. They don’t rely on any search methods or human gameplay, and can play about 180 years worth of games against itself every day – that’s a whopping 900 years per day in total for all five bots – far, far longer than any human lifetime. That’s a hell of a lot of compute needed too. The bots slurped up 128,000 CPU cores and 256 Nvidia P100 GPUs on Google Cloud.

“While today we play with restrictions, we aim to beat a team of top professionals at The International in late August subject only to a limited set of heroes,” OpenAI said.

The International is the biggest annual Dota 2 esports tournament, where winners can take home prizes worth up to several millions of dollars. “We may not succeed,” it cautioned. ®