On games programming and AI

Menu

In my spare time I maintain a unit testing library built for Unity3D. It’s called UnTest, it’s open-sourced under the MIT license, and you can download it here.

In the Unity3D community forums announcing UnTest, _Shockwave asked for an example of some real-life unit tests written with this framework. UnTest is very xUnit-flavoured, so they follow a standard pattern, but I thought it would be a good excuse to talk about good unit testing practice.

Much of my unit testing approach is from Roy Osherove’s Art of Unit Testing, which is a very readable and practical book on unit testing. It’s aimed at .Net, so highly applicable for Unity3D development. The Art of Unit Testing website also has some recorded conference lectures from Osherove that are also worth watching. If you want to get better at writing unit tests, these are great resources.

The unit test I’m going to dissect is below. It’s a real-life test from a production behaviour tree system. It’s not really important here to understand what a behaviour tree or a selection node is, as much as the patterns and conventions I followed. Good unit tests are readable, maintainable and trustworthy. As we walk through the test, I’ll explain how these qualities apply, and how to maximise them.

To increase readability, the first thing to note is the context of the file you can’t see. It’s in a file called SelectionNodeTests.cs, so I instantly know this test applies to the SelectionNode class. There’s only one class in this file, with the same name, so there’s no chance of confusion.

The name of the function follows a consistent convention throughout the codebase: FunctionUndertest_Context_ExpectedResult. There are many naming conventions you could follow, this is the one I do. Context is how we set up the world before running the function. In this case, we’re adding a single action node to the selection node. ExpectedResult is how we want the function to behave; here we want the selection node and the action node to be added to the path.

It’s not important how long the name of this function is, since it’s never called from anywhere. The more readable and informative you can make the function name, the easier it will be to figure out what went wrong when it fails.

The unit test is split into three sections following the AAA pattern: Arrange, Act, Assert.

The reason I keep it separate is to avoid “magic numbers”. It’s too easy to write code like, Assert.IsEqual(result, 5). The writer may know what this 5 means, but it would be much better for future readers to put it in a named variable and write Assert.IsEqual(result, hypotenuseLength).

Now this test is as readable as possible, how did I make it maintainable too? You’ll notice that by improving readability I’ve gone some way to also helping maintainability, as something that’s easier to read is also easier to understand, and therefore is easier to maintain. But there are other things I do as well.

Check out the first line:

var actionNode = FakeAction.CreateStub();

I need an action to put into the selection node. I could use an existing IAction concrete class, but then any bugs in that concrete class might cause this test to fail. I’ll cover more why that’s bad later, but just pretend it sucks.

I could derive a new class from IAction, which I could keep simple enough to avoid bugs, but then I’d have to maintain that whenever the Action class interface changed. It’s much easier to use a “mocking framework” to do most of the hard work for me.

A mocking framework is a library that can be used to construct a new type at runtime that derives from Action and just does the right thing (among many other things). Then any changes are picked up for me automatically, and I have less code to maintain. If that sounds like magic, that’s because it is.

There’s a mocking framework behind that FakeAction.CreateStub() call, but since it’s such a common use case in this test suite I’ve wrapped it up in a helper function.

Any mocking framework that works with mono will work with Unity3D. I use Moq. The latest version is here. I’ve mirrored this in a unitypackage here for easy importing to Unity.

To further isolate myself from changes, I’m constructing the member variables m_selNode and m_path in a setup function (not shown). This function is run automatically before every test, and makes new SelectionNode and Path objects. This is not only handy, because they’re used in every test in the class, but also isolates the unit tests from changes to the constructor signatures. Other commonly-used functions can also be hidden behind helper functions, but it’s best not to hide your function-under-test for readability reasons.

The final thing I need to do is make the test “trustworthy”.

By going through the maintainable and readable steps, I’ve made sure this test depends on the minimum amount of game code. When this test fails, hopefully it will only be because the function under test, UpdatePath(), had an error.

The more game code you depend on, the closer your test slips along the spectrum from unit to integration test. Integration tests check how systems connect together, rather than single assumptions. They have their place in a testing plan, but here I’m trying to write a unit test. A great rule of thumb is that a single line of buggy code should cause failures in the minimum of unit tests, and ideally only one. If lots fail, that’s because the code isn’t isolated properly and you’ve ended up with some integration tests.

Some of my early unit tests, from F1 2011, created a whole world for the AI to move in and recorded the results, rather than mocking surrounding code like we have here. The end result was that a single bug in the world code could cause many many tests to fail. That makes it hard to track down the root cause of the bug, and meant I had probably written integration tests instead of unit tests.

When this test does fail, it will be deterministic. There’s no dependency here on databases, network services, or random number generators. There’s nothing worse than unit tests that fail once in a blue moon, because they erode developer trust in the test suite. That’s how you end up with swathes of tests commented out, and wasted engineering time.

—

Now you understand why I’ve written this real-life unit test in this way, and why it’s important your unit tests are readable, maintainable and trustworthy. Like any type of programming, writing good unit tests takes practice and perseverance. They’re truly the foundation of your project, giving you the freedom to restructure at will and the confidence that your game code is high quality. But like any foundation, if they’re not well engineered the whole edifice comes rapidly crumbling down. Take the time to follow up with the resources I linked above, and you will hopefully avoid that situation.

Edit: Seems I misinterpreted part of Lucas’s speech. All these features are free, they will appear in whatever version of unity is most suitable. Eg shadows are part of Unity3D Pro, so those improvements are available to all Pro users for free. Here’s the list:

Windows 8 Store export, to RT as well, so they work on RT tablets

Mechanim avatar creation API: no longer need skeleton at time of build, can be applied to new skeleton at run time. Helps with player-created avatars.

Anti-aliased render textures: useful for occulus rift because VR headsets use a render target per eye, so now they can be antialiased.

Of those new features, I’m excited by the headless player support. That’s going to be great for client-server games that want to run on AWS or something. The presets also sound interesting – I’m a huge fan of animation curves, and anything that increases their functionality is great by me. And I could have used the more detailed memory snapshot tool while optimising Sonic Dash.

I’m not sure how much more I’ll be writing about context behaviours (no, I still haven’t really decided between context steering and context behaviours), so I’ve made this post as a way to wrap everything up.

I started by discussing exactly where and why steering behaviours start to break down:

After my GDC talk, Treff on twitter sent me a link to a paper from the late 90s by a researcher called Julio K. Rosenblatt. It had some similar ideas to my context steering technique. I thought I’d discuss the differences and similarities here.

The system asks modules (behaviours) to vote for how much it prefers each decision in a set of possible decisions. Each vote is weighted according to what behaviour it came from. Votes range from -1 (against) to 1 (for). Superficially this is similar to context steering, but does not split the votes across an interest and danger map. Because of this, it suffers from the same lack of movement constraint that we see with steering behaviours. The paper gets around this by weighting avoidance behaviours much more highly, but this just ends up disabling some nice emergent behaviours, as we saw with the balanced vector problem:

The merging of votes doesn’t happen at the decision space. From the diagram below, it seems like there’s some metadata about the curves used to write votes. Notice how a central curve is created from the two behaviours, rather than one small peak and one large peak. This is essentially a rasterized version of steering behaviours combined through weighted averages.

I think this all adds up to a rather expensive way of implementing steering behaviours. This is somewhat understandable as this paper came out just as or just before steering behaviours were starting to become popular, so the author may have been deep into his research by the time he heard of them.

There are several interesting aspects to the paper. It mentions that the behaviours all update at different frequencies, and the arbiter may receive votes at any time. This is great for those behaviours that are either low-priority or don’t change a lot, and allows easy parallelisation.

DAMN uses multiple subsystems, each asking the behaviours different questions. A speed subsystem (or “arbiter”) works out how fast to go, a Turn arbiter decides on direction, and because this is originally for controlling robots, a “field of regard” arbiter for working out where to turn the cameras. In comparison, context behaviours tend to use the maps for primarily computing a heading, then speed is calculated as a secondary factor – normally from the highest magnitude of interest or danger encountered. Splitting up like this makes for better separation of concerns, at a possible redundancy cost depending on implementation. It’s an idea worth exploring.

The paper talks about structuring behaviours using a subsumption-style approach, with high-frequency basic behaviours providing a “first level of competence”, built upon with more complex, possibly lower-frequency behaviours later. I like this way of thinking about behaviours. You can build your higher-level behaviours to be allowed to fail, knowing you’ll be caught by the lower-level systems.

There’s also some dense but potentially interesting passages that discuss methods of trying to evaluate the utility of each decision. It looks interesting but is a bit over my head. If anyone’s got any further information on what they were talking about, please share it in the comments.

In summary I don’t think there’s a lot of similarity between context behaviours and DAMN behaviours, beyond the superficial. Context behaviours could take heed of DAMN’s separation of concerns and the way polling is reversed, possibly making for better structuring of code. DAMN could do with adopting some of the simplicity of steering behaviours, or if required, the constraints and predictability of context behaviours.

Last time we saw how steering behaviour systems are very useful when either behaviour integrity isn’t important, or if there are a large number of entities to help hide any irregularities.

We saw this is actually an unavoidable feature of steering behaviours. The heart of steering behaviour systems, the merging of decisions of several behaviours, is what makes it so straight-forward to explain and implement However it is also a flaw. There is not enough information in the system for the decisions to be merged with integrity, and there never can be, as long as only a direction and magnitude are returned from each behaviour.

For many applications, this may not even be an issue. If the game needs a large collection of entities moving as a flock, the user isn’t necessarily interested if one entity occasionally makes a bad choice. The user sees the flock move as a whole, and isn’t looking at individual behaviours.

However if the application requires a small number of entities that interact individually with the player, like a racing game, then mistakes and collisions start to become very apparent. In fact they can be game-ruining if not dealt with properly.

The only way to fix these problems without replacing the system is to make the behaviours aware of each other, so they can return decisions that are sensible in the surrounding context. This leads to stateful and complex behaviours, and increased coupling, and doesn’t scale well when adding new behaviours. Can’t this be fixed without losing simple stateless composable behaviours?

I’m going to explain my solution to this problem, which is a more advanced version of the steering system I wrote for a shipped AAA racing game. After replacing the previous behaviour system, there was a net loss of 4,000 lines of code and yet there was a massive boost in the playability and expressiveness of the AI opponents.

To enable a steering system to merge properly, behaviours need to return much more information. They need to give not a single decision, but a view of the world as it appears to them. This is the context in which the behaviour would make a decision, if it was acting alone. The context of each behaviour can be merged and then, with all the information available, the system can make a decision. A sensible decision that always respects every behaviour, never gets stuck, and still shows emergence.

A behaviour will represent its context by writing into a context map. A context map is a projection of the decision space of the entity onto a 1D array. If the application features entities moving on a 2D plane, a map could be represented by evenly spaced radials around the entity, each a direction the entity could travel in, and associated with a single slot in the array. If the application has race cars zooming around a track, each slot of the array is associated with a distinct position to the left or right of the racing line, representing where the car would like to place itself.

An entity’s view of a 2D plane projected into a 1D context map

The context behaviour system creates two of these context maps – one for danger, and one for interest. The behaviours will fill these out. A strong entry in the danger map means someone thinks going that way would be bad. A strong entry in the interest map means someone would love to go that direction. The system passes both maps to every behaviour, asking each to fill them in with its own context.

Context maps are not cumulative. When a behaviour wants to add strength to a slot, it is only written if it is stronger than the value already in the slot.

Behaviours themselves look similar to their steering behaviour counterparts. Consider a chasing behaviour that selects a target and returns a direction towards it. The context maps version would instead iterate through each target, evaluate how strongly it wanted to chase it, and add that strength to the interest map slot that points towards the target. Any criteria can be used here, just like steering behaviours. The behaviour might be more interested in dangerous targets, or have some complex utility expression tree for evaluating interest, but for this example the behaviour will be more interested the closer the target is.

A collision avoidance behaviour would work in a similar manner. It would iterate through all obstacles, decide how strongly it wants to avoid each, and write that strength into the danger map slot that points towards the target. Again for this example, closer obstacles will be more dangerous.

Targets and obstacles write into the interest and danger maps respectively, with strength based on distance

These behaviours are stateless, small and easy to write. That advantage of steering behaviours has not been lost.

In practice, writing to a single slot is not very effective. The behaviour might not want to move directly towards an obstacle, but it might be good to avoid going anywhere near an obstacle as well. A similar thing applies for the chase behaviour – if the entity can’t move directly towards a target, it might be good to move in a direction that takes it a bit closer to it. For this reason when writing into the context maps it’s normally a good idea to write across a range of slots, with the strength ramping down the further the slot is from the target direction. There’s a lot of power and expression in how the strength in surrounding slots is created. Helper functions can help keep the behaviours small and clean while using this expressiveness.

Once the danger and interest context maps are fully populated, the system can process them and come up with a unique decision. The exact way the map is processed depends on the application. If there are simple entities moving on a plane, a suitable algorithm might be as follows: Find the slot with the lowest danger, or as will probably be the case the set of slots with the equal lowest danger. Look in the corresponding slots in the interest map and pick the slot with the highest interest. For a tiebreak, pick the slot that is closest to our current heading.

The result of the system is simply the direction of that slot coupled with the interest strength. The entity interprets this as a direction to move in, and takes the strength as proportional to the speed to travel. Because of this, an entity that has nothing but low-interest things to do might move quite slowly, but an obstacle chasing something highly interesting or dangerous would move quickly.

Now that the whole system has been explained, consider the problem from the previous article. There are two potential targets to chase, but what we would consider the best choice is obscured by an obstacle. A naive steering behaviours system implementation would lead to deadlock as the forces balanced out, or worse, oscillation. To avoid this, the chase behaviour (or some higher-level decision-making system) had to be aware of the collision avoidance system, so it could know to ignore the obscured target. The context of the collision avoidance system had bled into the chase behaviour and coupling has increased.

In the context behaviour system, there is interest pointing towards both targets, with more towards the best target. There is danger in the same region pointing towards the obstacle. The system evaluates the danger map first, taking only the least dangerous slots from the interest map. This leaves it with only the interest from the weaker target available, and that direction is chosen. The behaviours have remained lightweight and isolated, but the end result was a very complex decision.

Obstacle danger obscures most interesting target, so the less interesting target is chosen

Not only that but if a higher-level decision-making system is only concerned with choosing unobscured targets, it can now be removed. I found a lot of higher-level decisions in F1 – who to block, who to draft – could be left to the context behaviour system to work out without increasing coupling.

There are several ways this system can be extended to give nice results. Since the context maps are essentially one-dimensional images, they can be blurred to smooth out narrow troughs and spikes. Last frame’s context maps can be kept, and this frame’s results blended with them to provide free hysteresis to every behaviour. To do a similar thing for steering behaviours would require custom stateful code in every behaviour! The processing of the maps is ripe for vectorisation or offloading to a GPU.

Implemented as-is, the results of the system will always be exactly the direction of one slot. In the diagrams above, the entity can only move in 8 directions. This can lead to juddery behaviour if there aren’t a lot of slots. Taking the image metaphor again, the fix for this is to implement a kind of sub-pixel rendering. By taking the strength of surrounding slots, and approximating the gradient, the between-slot direction that would have the best strength can be found.

Since the behaviours are providing so much more information than a steering behaviours system, this solution is in isolation unavoidably more processor intensive than a steering behaviour equivalent. The exact complexity depends on the application, but the entities on a place example above is linear to the size of the context maps, and that’s probably typical.

However unlike steering behaviours, this system lends itself well to Level Of Detail (LOD) changes. The slot count of the maps can be changed from frame to frame, ramping down as the entity is further from the camera or player. This will compromise the quality of the movement, but the integrity of movement will still be preserved. If the system is structured so the behaviours are ignorant of the context map size, they don’t even have to know about LOD changes. The ability to have this kind of granular control over LOD is very rare.

By writing danger and interest into context maps, a context behaviour system can fix many of the problems that come from using steering behaviours, leading to small, stateless and decoupled behaviours that are still just as emergent and expressive.

The problem I explained in my last blog post is essentially the “why” of a talk I’m giving at GDC at the end of the month. There I’ll be explaining my solution as well as showing some demos. The follow-up blog post will appear after the talk, but that’s like skipping the cinema release for the DVD; it’s just not the same!