We are happy to announce that the ML-Agents team is releasing the latest version of our toolkit, v0.3.

This is our biggest release yet, and many of the major features we have included are the direct results of community feedback. The new features focus on expanding what is possible with ML-Agents with the introduction of Imitation Learning, Multi-Brain Training, On-Demand Decision-Making, and Memory-Enhanced Agents. We’re also making setup and usage simpler and more intuitive with the addition of a Docker-Image, changes to API Semantics and a major revamp of our documentation. Read on to learn more about the major changes, check our GitHub page to download the new version, and learn all the details on the release page.

Imitation Learning via Behavioral Cloning

In ML-Agents v0.3, we include the first algorithm in a new class of methods for training agents called Imitation Learning. Unlike Reinforcement Learning, which operates primarily using a reward signal, Imitation Learning only requires demonstrations of the desired behavior in order to provide a learning signal to the agents.

We believe that, in some scenarios, simply playing as the agent in order to teach it can be more intuitive than defining a reward, and a powerful new way of creating behavior in games.

There are a variety of possible Imitation Learning algorithms that can be used, and for v0.3 we are starting with the simplest one: Behavioral Cloning. This works by collecting training data from a teacher agent, and then simply using it to directly learn a behavior, in the same way that Supervised Learning for image classification or other traditional Machine Learning tasks work.

As we develop this feature and collect feedback, we plan to offer methods that are both more robust, and provide a more intuitive training interface. Since applying Imitation Learning to game development is new, we’d like to develop this feature with as much community feedback as possible in order to determine how to best integrate it into the developer’s workflow. If you try out our Imitation Learning, or have ideas about how you’d like to see it more integrated into the Editor, please share your feedback with us (ML-Agents@unity3d.com, or on our GitHub Issues page).

Multi-Brain training

One of the requests we received early on was for the ability to train more than one brain at a time. Say for example you have a soccer game where different players, for example offensive and defensive, need to be controlled differently. Using Multi-Brain Training, you can give each “position” on the field a separate brain, with its own observation and action space, and train it alongside other brains.

At the end of training, you will receive one binary (.bytes) file, which contains one neural network model per brain. This allows for mixing and matching different hyperparameters, as well as using our Curriculum Learning feature to progressively change how different sets of brains and agents interact within the environment over time.

On-Demand Decision-Making

Another request we received from the developer community was the ability to have agents ask for decisions in an on-demand fashion, rather than forcing them to make decisions every step or every few steps of the engine.

There are multiple genres of games, such as card games, real-time strategy games, role-playing games, board games, etc, all of which rely on agents being able to make decisions after variable amounts of time. We are happy to be supporting this in ML-Agents. You can now enable and disable On-Demand Decision-Making for each agent independently with the click of a button! Simply enable it on your agent, and make a simple function call on an agent to ask for a decision from its brain.

Changes to ML-Agents semantics

In order to help future-proof ML-Agents, we have made a series of changes to the semantics of the toolkit. These changes are designed to bring the terms and concepts we use within the system more in-line with the literature on Reinforcement Learning.

The biggest of these changes is that there is no longer the concept of “state.” Instead, agents receive observations of various kinds (vector, image, or text) from the environment, send these observations to their respective brain to have a decision calculated (either as a vector or in text), and then receieve this decision from the brain and use it to take an action. See the table below for an overview of the changes. These changes require changes to the API. To understand how these changes affect the current environments build using ML-Agents 0.2 and earlier, see here.

Old

New

State

Vector Observation

Observation

Visual Observation

(New) Text Observation

Action

Vector Action

(New) Text Action

Learning under partial observability

Part of the motivation for changing semantics from state to observation is that, in most environments, the agents are never actually exposed to the full state of the environment. Instead, they receive partial observations which often consist of local or incomplete information. It is often too expensive to provide the agent with the full state, or it is unclear how to even represent that state. In order to overcome this, we are including two methods for dealing with partial observability within learning environments through Memory-Enhanced Agents.

The first memory enhancement is Observation-Stacking. This allows an agent to keep track of up to the past ten previous observations within an episode, and to feed them all to the brain for decision-making. The second form of memory is the inclusion of an optional recurrent layer for the neural network being trained. These Recurrent Neural Networks (RNNs) have the ability to learn to keep track of important information over time in a hidden state. You can think of this as the memory of the agent.

Easier setup with Docker Image (Preview)

One of the more frequent issues developers faced when using ML-Agents had little to do with ML-Agents itself, and more to do with the difficulties of installing all the right prerequisites, such as Python and TensorFlow.

We want to make it as simple as possible for developers who want to stay within (and only think about) the world of Unity and C# to do so. As a first step toward this goal, we are enabling the creation of a Docker image, which contains all of the requirements necessary for training using an ML-Agents environment.

To train a brain (or brains), simply install Docker (easier than installing python and other dependencies, we promise), build your Unity environment with a Linux target, and launch the Docker image with the name of your environment.

New and revamped environments

We are happy to be including four completely new example environments with the release of v0.3: Banana Collectors, Soccer Twos, Bouncer, and Hallway. The first two of these are multi-agent environments, where the agents within the environment interact with one another either cooperatively, competitively, or both.

Banana Collectors

In Banana Collectors, multiple agents move around an area attempting to collect as many rewarding bananas (yellow) as possible, while avoiding negatively rewarding bananas (purple). The catch is that the agents can fire lasers at one another, freezing them in place. Inspired by research from DeepMind last year, the agents can learn to either fight over the bananas, or peacefully share them, depending on the number of rewarding bananas within the scene.

Soccer Twos

The second environment, Soccer Twos, contains a 2v2 environment, where each team contains both a striker and a goalie, each trained using separate reward functions and brains.

Bouncer

The third environment, Bouncer, provides an example of our new “On-Demand Decision-Making” feature. In this environment, an agent can apply force to itself in order to bounce around a platform, and attempt to collide with floating bananas. What makes the environment unique is that the agent only makes a new decision about where to bounce next once it has landed on the ground, which takes place after various time intervals.

Hallway

The fourth environment, Hallway (inspired by this paper) provides a test of the agent’s memory abilities, and our new support for Recurrent Neural Networks as an optional model type to be trained. In it, an agent must use local perception, to explore a hallway, discover the color of the block, and use that information to go to the rewarding goal.

In addition to these three new environments, we have also significantly revamped the Push Block and Wall Jump environments, and also provided a new unified look and feel to all of the example environments. We hope that these changes will make it easier for the community to train their own models in these environments, and to use the environments as inspiration for their own work.

Try it out

We encourage you to try out these new features, and let us know what you think. As always, since this is a beta product that is in flux, there may be bugs or issues. If you run into one, feel free to let us know about it on our GitHub issues page. If you’d just like to provide general feedback, or discuss ML-Agents with others using the toolkit, check out our Unity Connect channel. Additionally feel free to email us directly at ml-agents@unity3d.com with feedback, questions, or to show us what you are working on! Happy Training.

I love that Unity is working so hard on bringing ML to games, thank you!

One thing I have been struggling with is that I find it inefficient that the AI have to try to figure out what GOALS it has and not just come up with ways to reach those goals. For example in a racing game, wouldn’t it be more efficient if the AI knew from the beginning that it should reach the end of the track as fast as possible? Currently it is just dropped in clueless, moving around hitting everything until starting to figure out that it gets reward when doing certain things. But it still doesn’t really know what goal it has, just that it manages to do things that gives it rewards. Am I making sense? Maybe somehow combine Goal Oriented Action Planning with ML, so the ML part only need to focus on finding ways to reach a known goal.

You could address this issue in two ways.
1) Define short-term rewards with regard to a long term goal. For instance, assigning small penalties for every step the agent takes should encourage it to reach the end of the track as fast as possible.
2) Look into hierarchical reinforcement learning.https://blog.openai.com/learning-a-hierarchy/