Tools

Just an Intro : Heuristically Accelerated Hierarchical Reinforcement Learning in RTS Games

Posted by merothehero on December 3, 2009

Introduction

In this document I will analyze the game play and strategies of RTS Games, and then I will give a brief about how the Heuristically Accelerated Hierarchical Reinforcement Learning System (HAHRL-RTS System) will work.

The Strategy Game: An Analysis

There’s no doubt that strategy games are complex domains: Gigantic set of allowed Actions (almost infinite), Gigantic set of Game States (almost infinite), imperfect information, nondeterministic behavior, and with all this: Real-time Planning and Reactions are required.

Many of the Approaches to applying learning or planning to RTS Games considered the AI as one solid learning part; this leads to the great complexity at applying the techniques used.

I thought about: How can I simplify everything?

Firstly, I listed all the primitive actions which could be done by a human player:

1- Move a unit

2- Train/Upgrade a unit

3- Gather a resource

4- Make a unit attack

5- Make a unit defend

6- Build a building

7- Repair a building

NB: Upgrading units or buildings is not available in BosWars but found in most RTS Games.

Any player wins by doing 2 types of actions simultaneously, either an action that strengthens him or an action that weakens his enemy (Fig 1).
NB: We neglect useless actions here and suppose the player is perfect.

When a human plays a strategy game, he doesn’t learn everything at the same time. He learns each of the following 6 sub-strategies separately:

1- Train/Build/Upgrade attacking Units:What unit does he need to train??

Will he depend on fast cheep units to perform successive fast attacks or powerful expensive slow units to perform one or two brutal attacks to finish his enemy? Or will it be a combination of the two which is often a better choice?

Does his enemy have some weak points concerning a certain unit? Or his enemy has units which can infiltrate his defenses so he must train their anti-units?

Does he prefer to spend his money on expensive upgrades or spend it on more amounts of non-upgraded units?

NB: I deal with attacking Buildings as static attacking units

2- Defend:How will he use his current units to defend?

Will he concentrate all his units in one force stuck to each other or will he stretch his units upon his borders? Or a mix of the two approaches?

Will he keep the defending units (which maybe an attacking building) around his buildings or will he make them guard far from the base to stop the enemy early. Or a mix of the two approaches?

If he detects an attack on his radar, will he order the units to attack them at once, or will he wait for the opponent to come to his base and be crushed? Or a mix of the two approaches?

How will he defend un-armed units? Will he place armed units near them to for protection or will he prefer to use the armed units in another useful thing? If an un-armed unit is under attack how will he react?

What are his reactions to different events while defending?

3- Attack:How will he use his current units to attack?

Will he attack the important buildings first? Or will he prefer to crush all the defensive buildings and units first? Or a mix of the two approaches?

Will he divide his attacking force to separate small forces to attack from different places, or will he attack with one big solid force? Or a mix of the two approaches?

What are his reactions to different events while attacking?

4- Gather Resources: How will he gather the resources?

Will he train a lot of gatherers to have a large rate of gathering resources? Or will he train a limited amount because it would be a waste of money and he wants to rush (attack early) in the beginning of the game so he needs that money? Or a mix of the two approaches?

Will he start gathering the far resources first because the near resources are more guaranteed? Or will he be greedy and acquire the resources the nearer the first? Or a mix of the two approaches?

5- Construct Buildings:How does he place his buildings? Will he stick them to each other in order to defend them easily? Or will he leave large spaces between them to make it harder for the opponent to destroy them? Or a mix of the two approaches?

6- Repair:How will he do the repairing? Although it’s a minor thing, but different approaches are used. Will he place a repairing unit near every building in case of having an attack, or will he just order the nearest one to repair the building being attacked? Or a mix of the two approaches?

The HAHRL-RTS System

Since the 6 sub-strategies do not depend on each other (think of it and you’ll find them nearly independent), So, I will divide the AI system to a hierarchy as shown in figure 1, each child node is by itself a Semi-Marcov decision process (SMDP) where Heuristically Accelerated Reinforcement Learning Techniques will be applied. Each child node will be later divided into other sub-nodes of SMDPs.

So now you’ve understood why it is hierarchical, but you may be wondering why “heuristically accelerated”?

Well, because heuristic algorithms will be applied to guide and accelerate the learning process. I hear you asking what the heuristic algorithms are. But that’s another story will be said later!