When creating a system like an AI, which can take many different paths very quickly, or really any algorithm that has several different inputs, the possible result set can contain a large number of permutations.

What approach should one take to use TDD when creating a system that outputs many, many different permutations of results?

The overall goodness of the AI system is usually measured by Precision-Recall test with a benchmark input set. This test is roughly on par with "integration tests". As others have mentioned, it is more like "test-driven algorithm research" rather than "test-driven design".
–
rwongOct 21 '11 at 8:12

Please define what you mean by "AI". It's a field of study more than any particular type of program. For certain AI implementation, you generally can't test for some types of things (ie: emergent behaviour) via TDD.
–
Steve EversOct 21 '11 at 19:48

@SnOrfus I mean it in the most general, rudimentary sense, a decision-making machine.
–
NickCOct 21 '11 at 20:58

5 Answers
5

Taking a more practical approach to pdr's answer. TDD is all about software design rather than testing. You use unit tests to verify your work as you go along.

So on a unit test level you need to design the units so they can be tested in a completely deterministic fashion. You can do this by taking anything that makes the unit nondeterministic (such as a random number generator) and abstract that away. Say we have a naïve example of a method deciding if a move is good or not:

This method is very hard to test and the only thing you really can verify in unit tests is its bounds... but that requires a lot of tries to get to the bounds. So instead, let's abstract away the randomizing part by creating an interface and a concrete class that wraps the functionality:

The Decider class now needs to use the concrete class through its abstraction, i.e. the Interface. This way of doing things is called dependency injection (the example below is an example of constructor injection, but you can do this with a setter as well):

You might ask yourself why this "code bloat" is necessary. Well, for starters, you can now mock the behavior of the random part of the algorithm because the Decider now has a dependency that follows the IRandoms "contract". You can use a mocking framework for this, but this example is simple enough to code yourself:

Strict TDD does tend to break down a bit for more complex systems, but that doesn't matter too much in practical terms - once you get beyond being able to isolate individual inputs, just pick some test cases that provide reasonable coverage and use those.

This does require some knowledge of what the implementation will be to do well, but that is more of a theoretical concern - you are highly unlikely to be building an AI that was specified in detail by non-technical users. It's in the same category as passing tests by hardcoding to the test cases - officially the test is the spec and the implementation is both correct and the fastest possible solution, but it never actually happens.

It's not possible to test every permutation of a computation with many variables. But that's nothing new, it has always been true of any program above toy complexity. The point of tests is to verify the property of the computation. For instance, sorting a list with 1000 numbers takes some effort, but any individual solution can be verified very easily. Now, although there are 1000! possible (classes of) inputs for that progam and you can't test them all, it's completely sufficient to just generate 1000 inputs randomly and verify that the output is, indeed sorted. Why? Because it is nearly impossible to write a program that reliably sorts 1000 randomly generated vectors without being also correct in general (unless you deliberately rig it to manipulate certain magic inputs...)

Now, in general things are a bit more complicated. There really have been bugs where a mailer would not deliver emails to users if they have an 'f' in their username and the day of the week is Friday. But I consider it wasted effort trying to anticipate such weirdness. Your test suite should provide you with a steady confidence that the system does what you expect on the inputs you expect. If it does funky things in certain funky cases you will notice soon enough after you try the first funky case, and then you can write a test specifically against that case (which will usually also cover an entire class of similar cases).

Far from falling apart with complexity, it excels in these circumstances. It will drive you to consider the larger problem in smaller pieces, which will lead to a better design.

Do not set out to try to test every permutation of your algorithm. Just build test after test, write the simplest code to make the test work, until you have your bases covered. You should see what I mean about breaking the problem down because you will be encouraged to fake out parts of the problem while testing other parts, to save yourself having to write 10 billion tests for 10 billion permutations.

Edit: I wanted to add an example, but didn't have time earlier.

Let's consider an in-place-sort algorithm. We could go ahead and write tests which cover the top end of the array, the bottom end of the array and all sorts of weird combinations in the middle. For each one, we would have to build a complete array of some kind of object. This would take time.

Or we could tackle the problem in four parts:

Traverse the array.

Compare selected items.

Switch items.

Coordinate the above three.

The first is the only complicated part of the problem but by abstracting it out from the rest, you have made it much, much simpler.

The second is almost certainly handled by the object itself, at least optionally, in many static-typed frameworks there will be an interface to show whether that functionality is implemented. So you don't need to test this.

The third is incredibly easy to test.

The fourth just handles two pointers, asks the traversal class to move the pointers around, calls for a compare and based on the result of that compare, calls for the items to be swapped. If you've faked out the first three problems, you can test this very easily.

How have we led to a better design here? Let's say that you've kept it simple and implemented a bubble sort. It works but, when you go to production and it has to handle a million objects, it is far too slow. All you have to do is write new traversal functionality and swap it in. You don't have to deal with the complexity of handling the other three problems.

This, you will find, is the difference between unit testing and TDD. The unit tester will say that this has made your tests fragile, that if you had tested simple inputs and output then you wouldn't now have to write more tests for your new functionality. The TDDer will say that I have separated concerns suitably so that each class I have does one thing and one thing well.