AI masters 49 Atari 2600 games without instructions

Brain-like artificial intelligence almost as good as professional games tester.

Artificial intelligence, machines and software with the ability to think for themselves, can be used for a variety of applications ranging from military technology to everyday services like automated telephone systems. However, none of the systems that currently exist exhibits learning abilities that would match the human intelligence. Recently, scientists have wondered whether an artificial agent could be given a tiny bit of human-like intelligence by modeling the algorithm on aspects of the primate neural system.

Using a bio-inspired system architecture, scientists have created a single algorithm that is actually able to develop problem-solving skills when presented with challenges that can stump some humans. And then they immediately put it to use learning a set of classic video games.

Scientists developed the novel agent (they called it the Deep Q-network), one that combined reinforcement learning with what's termed a "deep convolutional network," a layered system of artificial neural networks. Deep-Q is able to understand spatial relationships between different objects in an image, such as distance from one another, in such a sophisticated way that it can actually re-envision the scene from a different viewpoint. This type of system was inspired by early work done on the visual cortex.

Scientists considered tasks in which Deep-Q was able to interact with the environment through a sequence of observations, actions, and rewards, with an ultimate goal of interacting in a way to maximize reward. Reinforcement learning systems sound like a simple approach to developing artificial intelligence—after all, we have all seen that small children are able to learn from their mistakes. Yet when it comes to designing artificial intelligence, it is much trickier to ensure all the components necessary for this type of learning are actually included. As a result, artificial reinforcement learning systems are usually quite unstable.

Here, these scientists addressed previous instability issues in creating Deep-Q. One important mechanism that they specifically added to Deep-Q was “experience replay.” This element allows the system to store visual information about experiences and transitions much like our memory works. For example, if a small child leaves home to go to a playground, he will still remember what home looks like at the playground. If he is running and he trips over a tree root, he will remember that bad outcome and try to avoid tree roots in the future.

Using these abilities, Deep-Q is able to perform reinforcement learning, using rewards to continuously establish visual relationships between objects and actions within the convolution network. Over time, it identifies visual aspects of the environment that would promote good outcomes.

This bio-inspired approach is based on evidence that rewards during perceptual learning may influence the way images and sequences of events or resulting outcomes are processed within the primate visual cortex. Additionally, evidence suggests that in the mammalian brain, the hippocampus may actually support the physical realization of the processes involved in the “experience replay” algorithm.

It takes a few hundred tries, but the neural networks eventually figure out the rules, then later discover strategies.

Scientists tested Deep Q’s problem-solving abilities on the Atari 2600 gaming platform. Deep-Q learned not only the rules for a variety of games (49 games in total) in a range of different environments, but the behaviors required to maximize scores. It did so with minimal prior knowledge, receiving only visual images (in pixel form) and the game score as inputs. In these experiments, the authors used the same algorithm, network architecture, and hyperparameters on each game—the exact same limitations a human player would have, given we can't swap brains out. Notably, these game genres varied from boxing to car-racing, representing a tremendous range of inputs and challenges.

Remarkably, Deep Q outperforms the best existing systems on all but six of the games. Deep Q also did nearly as well as a professional human games tester across the board, achieving more than 75 percent of the human's score on the majority of the games.

The scientists also examined how the agent learned from contextual information using the game Space Invaders. Using a special technique to visualize the high-dimensional data, scientists saw that the situations that looked similar mapped to nearby points, as you'd expect. But Deep Q also learned from sensory inputs in an adaptive manner: similar spatial relationships within Deep Q’s neural network were found for situations that had similar expected rewards but looked different. Deep Q can actually generalize what it has learned from previous experiences to different environments and situations just like we can.

Deep Q vs. Space Invaders.

This bio-inspired approach suggests that modeling artificial intelligence systems on the mammalian brain and neural system could be a successful avenue to the development of artificial intelligence systems. So now all that’s left is to ask ourselves one question: do you think you could beat Deep-Q’s scores?