Reinforcement Learning

Typically, artificial intelligence in video games does not have a lot to do with aEURoeintelligence.aEUR? The algorithms are carefully hand-tweaked to the gameaEUR(TM)s scenarios and made aEURoegood enoughaEUR? to provide a decent challenge to the player. The problem with this approach is that the aEURoeintelligenceaEUR? must be coded somehow. Therefore, programmers try to simplify the problem, for example by giving the AI a certain number of states.

Maybe an enemy AI has a neutral state in which it only walks around. When it hears a noise, it may switch into an alert state in which it actively searches for the origin of that noise. When it sees the player, it switches into an aggressive state and attacks. None of this has anything to do with learning or intelligence. ItaEUR(TM)s a fixed set of rules defined by the programmer in an attempt to give the AI human-like behavior. Some games pull this off better than others.

Obviously itaEUR(TM)s our goal to end up in the aEURoebetteraEUR? category, but not only that, we are actively looking into alternative ways of giving the computer-controlled characters intelligent behavior. Currently, we are considering a type of machine learning called reinforcement learning. The idea behind it is to give the AI rewards for all actions it takes. These rewards can be positive, negative or even zero. ItaEUR(TM)s similar to how animals are trained, and similar even to how humans learn. A child that touches the hot coffee mug is given a negative reward by the mug (aEURoeouch, thataEUR(TM)s hotaEUR?) and will remember to be careful next time.

You can train a reinforcement learning AI by defining an appropriate reward structure. For example, you may give a +5 reward whenever the AI manages to take health points from the player and a -3 reward whenever it loses health itself. In this example weaEUR(TM)d be teaching the AI to be aggressive. It cares more about hurting the player than it cares about maintaining its own health. We might also give a -1 reward for every second the fight goes on, to encourage the AI to win as quickly as possible.

In a game like ours there are many possible actions an AI can take. In the early stages of learning, the machine will be pretty clueless about what to do. Remember, weaEUR(TM)re not directly telling it through code what to do! LetaEUR(TM)s say the player casts a fire ball. The clueless AI stands there and decides to do nothing. The fire ball hits it, and it loses health. This gives a -3 reward for each lost hit point, and the AI will begin to understand that the state aEURoefireball approachingaEUR? should likely not be followed by the action aEURoedo nothing.aEUR?

You can see that even though we never specifically tell the AI through code what to do against a fireball, it will experiment with all the possible actions. Through trial and error it will figure out the best actions to take in any given situation and learn to become good at the game. Probably too good for a human opponent, but thataEUR(TM)s a topic for another post.