Published

Stepping Through the Looking Glass: Test-Driven Game Development (Part 2)

Part 1 of this article provided just a glimpse of what was behind the looking glass. Now we’re ready to dive in all the way and look at how we can apply test-driven development to games. Be warned: the looking glass is very much one-way only. After you try this, you might become test infected and may never be able to go back and write code the way you’ve done up until now.

What to test

By now you have a pretty good idea of what TDD is and how to apply it. But what exactly do we test? How do we go about testing it?

Depending on the code in question, I use one of three main approaches to test that the code is doing what it should.

Check output

This is the easy one. You make a function call, and check the result value. Everything you need to know is there at your fingertips.

In this situation, you’re checking that a particular operation changes some state. You might get a success/failure return code, but that doesn’t tell you enough (you still want to test that separately though). For example, applying a speed powerup to an entity returns true or false depending on whether the entity consumed that powerup, but we need to test that it did the right thing.

If I had to add a third test related to that, I would refactor the tests into a fixture. More on that in a minute.

What if the entity class didn’t have a GetSpeed() method? Ideally you want to test things by using them the way you expect them to be used in the game. I suppose you could test the speed by the slightly roundabout method of letting the entity move for one frame and see the difference, but that’s a bit too convoluted.

In a case like this, I’d say that adding a GetSpeed() method is justified. You want to stay clear of always falling back into this pattern and littering your interfaces with GetXXX() methods though. Remember, testing is supposed to make your code more robust and easier to understand, not get in the way and complicate interfaces.

What if you ever come up against something that seems untestable? Hard-core TDDers will claim that if you can’t test something, you shouldn’t implement it. That’s a sentiment I agree with, but I have encountered a just a handful of things over time that I considered too much of a pain to test and skipped the tests. I suspect the more experience once acquires with TDD, the less common that situation is. But in general, if you can’t test it, then think about how to implement it in a different way. Chances are the new design is going to be better. It’s yet another case of TDD in action affecting the final design.

Check interaction between objects

A lot of the code in a game is more than simple functions that return values or set states. In a complex game engine, objects communicate with each other, sending messages, calling methods, and performing a sequence of actions. Testing that is just as important as testing the individual state in an object.

To test interactions between objects, you use mock objects. Mock objects are objects that look like the regular object to the code, but are nothing more than a pretty face with testing code behind it for us to verify that things worked as expected. Most mock objects I work with are extremely simple, and are limited to increasing a counter when something happens, or storing a copy of some argument passed to a function.

One of the mock objects I end up using quite frequently is one to verify that object lifetimes are managed correctly. For example, imagine we have an entity system that uses components. The entity owns the components, so if the entity is ever destroyed, it should destroy all the components. How would we implement that? Pretty easy: in the destructor we write a loop that…. No, wrong answer. Before we write anything, we need to write a test for it. Something like this:

So we create an entity, we add a component, and we destroy the entity. How can we verify that the component was destroyed along with it? I suppose we could try to access the component we allocated and see if that memory is valid, but that’s a really ugly and non-portable solution. It would be much cleaner to use a mock object: that simply keeps track of how many instances of that class are currently allocated.

TDD and Games

All right, how about about using TDD for game development? The great majority of the code in a game is code just like in any other type of application. You have objects that are doing some processing and communicating with other objects. We can test that very easily. So, what’s different about games?

Graphics

There’s a misconception that games are all about graphics. That’s just plain wrong. It’s true that games make heavy use of graphics, and games interact with the user primarily through the use of graphics (along with audio and force feedback). But the bulk of the code in a game is not doing low-level graphics operations. Instead, it’s running AI, figuring out what things are visible this frame, sending objects down the pipeline, moving entities in the world, updating physics simulations, etc.

The best thing you can do is forget about low-level graphics. Wrap it up neatly at a really low level in a little library and push it aside. Then you can concentrate on testing everything that uses graphics instead of getting bogged down with the graphics themselves.

For example, we can easily develop through TDD the code that will determine what’s in view given a camera in our world. We can also use TDD to come up with a solution to correctly sort the meshes sent to the graphics renderer to optimize performance. All those are higher-level operations that are not involved with the low-level graphics API or hardware, so nothing is stopping us from testing them.

But it is even possible to test fairly low-level operations. As an example, say you’re about to send to the hardware an object that is lit with two point lights and a directional light, and you want to make sure the right shader is selected (or the right states are set). In this situation, you’re making a function call (Render()), and you can either check the state of the graphics renderer directly, or you can insert a mock object between your code and the graphics API and detect that the correct functions were called.

To make the tests more effective, try to make sure you can run most of your game code without having to create a graphics device, initialize it, create a window, set a video mode, etc. Not only will that decouple your code from the graphics library (which is a good thing), but it will let you run the tests much more quickly and on a variety of platforms. Otherwise, any machine that runs the tests might be required to have DirectX 9.0c, a fancy graphics card, etc. If there’s really no way around it, you can go ahead and do any graphics initialization at the beginning of the test run for the whole graphics library and shut it down when the tests complete, but ideally you shouldn’t have to do that at all.

What about verifying that the right calls and data are sent to the graphics hardware? That’s up to you. TDD is not a religion or an absolute mandate. It’s a tool that I happen to think can be applied to most situations and give great results. If you’re writing a driver for some hardware you should definitely do it. If you’re writing graphics middleware, you might want to consider that too. Otherwise, it’s really not worth it. If you’ve tested everything up until the call where you just set the right render states and push the data down, you can safely trust the graphics hardware to do the rest.

Middleware

Middleware is a big topic in game development (shameless plug), and chances are good you’re using some form of middleware in your projects. Maybe it’s something as simple as Bink, or something as complex as RenderWare. In any case, it is true that working with an external library (which may not even come with source code) makes things a bit more difficult.

You should treat middleware very much like what we did with graphics. Assume it works correctly, but test everything leading up to the interface with the middleware library.

The key idea is to remember that with TDD, you want to test that the code you’re about to implement is doing the right thing. For example, imagine you’re integrating some physics middleware into your engine. You might start with a simple test like this:

Next you might want to check that if you apply a force it moves in the right direction. Then you check for friction, collisions, bouncing, deformation, etc. The fact that you’re using an external middleware library to solve those problems should not matter as far as the tests are concerned.

Now let me get on my soapbox for a second. If you’re a middleware provider, I hope you’re using unit tests (and of course, I’d recommend that they be developed before the code). If you do that, please, please, make sure you also release your unit tests to source code licensees. The unit tests will serve both as up-to-date documentation and as a safety net in case they make any changes. And who needs more safety nets than the people who didn’t write the code in the first place?

Hardware

A lot of games are developed on a PC but run on different platforms (PS2, Xbox, Gamecube, custom arcade hardware, handhelds, etc). How do we test them? As much as possible, I would recommend trying to keep as much of the code as platform-independent as possible. Not only will it help with testing but it’s probably a good business and engineering decision. At that point, you can test it just like any other code, hopefully in the same step as you build your code, without any delays.

Additionally, it’s a good idea to run the unit tests on the target platforms themselves. This is something you could run more infrequently, such as right before you’re about to check in code, and, of course, by the automated build machine itself. That way you’ll catch any platform differences, like subtle changes introduced by different endianness, etc.

Large amounts of data

Of all the things that make game development different, this is probably one of the most unique ones. Games, and especially modern games, often use many gigabytes of data. What does that mean for unit testing and TDD?

Fortunately, not very much. These tests we’re writing are unit tests. They test the functionality inside one small part of a class. They shouldn’t have to deal with any data, and especially not with many gigabytes worth of it. They should test that the code does the right thing, period.

Let’s do some examples

In case you’re not fully convinced yet, let’s run through some real-world examples. Thanks to Ivan-Assen Ivanov for providing the specific examples (taken out of some real-world tasks he had to work on).

Example #1: Computing a passability grid

Compute a passability grid for use by the pathfinder code. The passability values are computed based on terrain features such as slope, road textures, water depth, etc.

Where do we start? Before we start thinking of algorithms, let’s just create a test that checks the obvious. I should be able to create an empty grid with some default values.

Notice the tests aren’t even providing full coverage or anything. I simply sampled five points outside of the grid. If I ever find there’s a bug there, I’ll go back, write a specific test, see it fail, and then fix it. For now, this is good enough.

Next I would check that our tests work with a slightly larger grid since I could have easily hardwired a 1×1 grid in the grid class.

What next? On a totally flat terrain without any roads, the resulting passability grid is uniform and the nodes are passable. By this time you’ll probably already have a terrain class (which should have hopefully been developed using TDD), so I’ll just assume we have a Terrain class.

Notice that we just created a new constructor for the PassabilityGrid. It wasn’t part of a grand master plan or a fancy UML design. It simply felt like the right thing to do when using PassabilityGrids this way. That’s the beauty of TDD: the code design follows the use of the code.

I think you see where this is heading. Next I would try creating a terrain with one mountain cell, and seeing that the corresponding grid node was impassable. Then create a road and see that those cells are more easily passable than others.

Example #2: Creating a thin layer around audio library

One particular feature of this layer, which is not simply reordering parameters and calling another function, is multi-sample sounds: when the sound engine receives an order to play “UnitWalk”, it chooses randomly with specific probabilities between UnitWalk1.wav, UnitWalk2.wav and UnitWalk3.wav; this mapping, along with probabilities and some other parameters needed by the sound system, comes from a bunch of XML files.

That’s a pretty big task, so we need to break it down into something smaller just to even start thinking about it. If it’s really a thin layer, it’s possible that some functions might call directly into the audio library. Those I would just do without a test (whenever I needed them). For example, it’s possible that our library might have an Initialize() call that simply calls SoundLibrary::Initialize(). That’s fine. No tests there.

Let’s concentrate on the mapping of sound events to specific sounds with specific probabilities. Where do we start? Reading XML files? Figuring out random numbers? No, let’s take it from the top. Let’s write a really simple test of a trivial case and make it pass. How about this?

OK, that’s not exactly a pretty-looking test, but it’s a start. It gives us a rough guide of where to go. Forget about XML files and probabilities, or anything. I play an event, and I want to see a particular wav file played out the other end. How do I implement that? I hardcode a “UnitWalk1.wav” in the sound library, of course. Test passed. Yeah, I know, the horror! But it took 20 seconds and we made some progress (all tests are passing). Besides, we already started making decisions about how we want to be using this sound system, so we definitely made some progress.

One thing I don’t like about that previous test is that it relies on the function GetLastSoundPlayed(). That’s not a function that’s necessary to work with the GameSoundSystem, and the only reason we introduced it was to help us test it. It’s OK to do that from time to time, but I’d rather keep my class interfaces as simple and uncluttered as possible. So instead, we’ll go ahead and refactor that test to use a mock object representing the real system sound library that will collect the name of the last sound played.

This will totally fail because we know that our sound system is playing “UnitWalk1.wav” no matter what the event is. Time to fix that. For that we probably want to add the concept of mapping. For now, let’s just do a one-to-one mapping.

Everything is passing again. Now let’s add some probabilities to it. This part gets a bit tricker because it involves some randomness. Radomness is one of those tricky things when it comes to testing because you don’t want your results to change from run to run. You also don’t want to get into the situation that you’re running the same test multiple times to verify that you get the correct sound played 20% of the time.

At the same time, randomness plays an important part in game development, so it’s crucial to know how to deal with it correctly. I decided to ask in the testdrivendevelopment mailing listlooking for some good insights.The best solution seemed to be to move the randomness factor as an input. In the production code, that input will be filled by the game random number generator. In our tests, we can pass whatever value we want and we know what results we’re going to get.

Here’s the next test that includes some different probabilities of different sounds being played.

Unfortunately, I don’t want the public interface of GameSoundSystem::PlayEvent() to take an event name and a number between 0 and 1. That’s only for internal consumption and shouldn’t be exposed through the interface. On the other hand, I really need that function to test it (and it cleans things up considerably from an implementation point of view). Here’s a case where I think giving the test access to a private function is fine.

If you feel uncomfortable with that idea, you would have to create a new class that sits between the GameSoundSystem and the SoundSystem itself and takes care of doing the mapping. That class can have the number between 0 and 1 as part of its interface because it won’t be exposed outside of the library. That might actually not be a bad idea since the mapping between events and sounds could be the full-time job for a class.

Just because the first implementation of the mapping of events could have been hardwired (after all, do the simplest thing that could possibly work), I would add another similar test with different values:

I would then refactor the tests to come up with a better way to express the mapping between events and sounds than passing huge lists of parameters, check for valid probabilities (can’t add up to anything different than 1.0), etc. We never got around to the XML part, but that’s something totally orthogonal to this. First let’s worry about getting the data to the object, then we can load that data in any way we want (but we’ll test that too when we get to it, of course).

That’s enough for one day. The next (and final!) part of this article will deal with specific tips for doing TDD that I found particularly useful, especially dealing with C++ and games. I’ll also cover what are some of the immediate consequences you can expect from doing TDD, including build times, development speed, etc. In the meanwhile, get your unit-test framework ready and give it a whirl.

Published

My Problem with TDD has always been that it conflicts with some of my other code practices. I’m a big fan of KISS in its usual form and in keeping the raw amount of code to a minimum. I believe that less code (to a point) and less complexity (to a point) is almost always best. TDD adds more code and more complexity and that is where my conflict lives. With TDD you not only have to maintain a normal codebase, but a secondary one which is may be just as large.

However.. After reading this article I’ll be a sport and give it another go (5th times a charm?). Maybe with my previous attempts I have just missed something. I totally “get” why TDD could be so awesome. I just have not been able to achieve enough “awesomeness” to compromise/modify my programming morals.

Tom Plunket

TDD adds code, but removes complexity. Try it in earnest for a few days and I’m sure you’ll see for yourself.

The stunningly wonderful thing about TDD is that you often end up with code that is far simpler than you previously envisioned.

TDD has helped me discover algorithms, has helped me learn how to be a better programmer, and has shown me bugs in middleware. I’ve been sold on it for a while…

Good luck, though. It’s a bit tricky to get into for a number of reasons, not the least of which is feeling less productive at first. This feeling soon goes away, and is replaced by the complementary feeling that you’re on a tightrope without a net if you’re “required” to work without tests…

After you have possibly made a bit of a mess of your code because you are trying to make it pass the test you just made. And now it passes. And you are ready to check in the code. Check it in because it works. 🙂 But then go back and look at your code and see if you can refactor and make it clearer. ( e.g. see if those accessors are actually needed. etc )

The same goes for tests. For me, my tests usually get the most messy. Making test objects, making test data, making “correct data” / mock objects of systems to run against, check various conditions, checking boundary cases, etc. Refactoring those is a necessity.

But once you have them. Oh so sweet. No longer do you fear making significant changes to the code. And when other people need to add something or fix something in the code you wrote. Boom just run the unit tests and see if they still work. No more “oh I changed the defaults for how some data is loaded and now the mobs in the game don’t move.”

For me, one key insight for using TDD was: Okie I have to make this object. It has a bunch of requirements. When I am coding that object I am either using the debugger to test out those requirements or I am writing the object and then loading up the game and testing all of the requirements in the game. So maybe I spend maybe 2x as much time up front writing tests compared to loading up the game and testing. (that time goes down once writing unit test becomes second nature). But now when an issue arises I can run the tests and quickly say: “well the code is working let’s look at the data.” (NOTE: TDD purists will say TDD helps you “explore” the solution space or what not. Yes it does, but if you are going to code something you have to have some user story or requirements or idea of what you are making. TDD doesn’t really help with that part. Once you have that part you can start doing TDD.)

Also with tests you have a nice way to check individual components easily. Example: some texture or game object is causing a crash or is not being loaded correctly or the behavior is wrong. Usually you load up some test level, find that object, load it up and then try to debug what is going wrong (in some cases many steps are involved to place only that object). With TDD there should be a test somewhere where you can point to the offender instead of the mock object you were using before and have a small working set to debug. And if there is not, well time to make one!

Also another key insight for me was: you don’t need to test EVERYTHING. That way lies pure madness. When you are writing an object you can probably guess / know which parts are going to be “tricky” / “open to errors”. So test those. (i.e. replace debugger time with test writing time to make certain you coded it correctly) Don’t fret over not testing whether or not the the setName( const std::string& p_newName ) actually set the name when the code for that is something like:

I think _PURE TDD_ would require that a test be written, have it fail, then go write the above and smile when it worked. And for some of the crazy string classes out there you maybe would want to actually test that! But to me that is going to be in the 20% in the 80/20 rule of what needs to be tested. Of course if this manages to fail at some point, a test shall be created with all due haste!

But something like:

std::string getFullName() const;

// returns the full name of the player. First and last name are capitalized and the title is prepended to the name iff there is a title. Additionally the guild affiliation is attached to the end also.

// NOTE: we probably don’t want the guild name included in this but that is what design wants at this time.

That probably gets some tests where you have you test all of the conditions.