Let's say you're writing a Yahtzee game TDD style. You want to test the part of the code that determines whether or not a set of five die rolls is a full house. As far as I know, when doing TDD, you follow these principles:

Should you unit test every possible valid combination (both of values and positions) for a full house? That looks like the only way to be absolutely sure that your IsFullHouse code is completely tested and correct, but it also sounds quite insane to do that.

How would you unit test something like this?

Update

Erik and Kilian point out that using literals in the initial implementation to get a green test might not be the best idea. I'd like to explain why I did that and that explanation does not fit in a comment.

My practical experience with unit testing (especially using a TDD approach) is very limited. I remember watching a recording of Roy Osherove's TDD Masterclass on Tekpub. In one of the episodes he builds a String Calculator TDD style. The full specification of the String Calculator can be found here: http://osherove.com/tdd-kata-1/

Thank you for summarizing Erik's answer, be it in a less argumentative or civilized way.
– Kristof ClaesFeb 3 '13 at 20:33

1

"Write the simplest thing that works", like @Carson63000, is actually a simplification. It's actually dangerous to think like that; it leads to the infamous Sudoku TDD debacle (google it). When followed blindly, TDD is indeed braindead: you cannot generalize an non-trivial algorithm by blindly doing "the simplest thing that works"... you have to actually think! Unfortunately, even alleged masters of XP and TDD sometimes follow it blindly...
– Andres F.Feb 3 '13 at 22:26

1

@AndresF. Note your comment has appeared higher in Google searches than much of the commentary about the "Soduko TDD debacle" after less than three days. Nevertheless How to not solve a Sudoku summed it up: TDD is for quality, not correctness. You need to solve the algorithm before starting coding, especially with TDD. (Not that I'm not a code first programmer too.)
– Mark HurdFeb 6 '13 at 2:22

6 Answers
6

There are already lots of good answers to this question, and I've commented and upvoted several of them. Still, I'd like to add some thoughts.

Flexibility isn't for novices

The OP clearly states that he's not experienced with TDD, and I think a good answer must take that into account. In the terminology of the Dreyfus model of skill acquisition, he's probably a Novice. There's nothing wrong with being a novice - we are all novices when we start learning something new. However, what the Dreyfus model explains is that novices are characterized by

rigid adherence to taught rules or plans

no exercise of discretionary judgement

That's not a description of a personality deficiency, so there's no reason to be ashamed of that - it's a stage we all need to go through in order to learn something new.

This is also true for TDD.

While I agree with many of the other answers here that TDD doesn't have to be dogmatic, and that it can sometimes be more beneficial to work in an alternative way, that doesn't help anyone just starting out. How can you exercise discretionary judgement when you have no experience?

If a novice accepts the advice that sometimes it's OK not to do TDD, how can he or she determine when it's OK to skip doing TDD?

With no experience or guidance, the only thing a novice can do is to skip out of TDD every time it becomes too difficult. That's human nature, but not a good way to learn.

Listen to the tests

Skipping out of TDD any time it becomes hard is to miss out of one of the most important benefits of TDD. Tests provide early feedback about the API of the SUT. If the test is hard to write, it's an important sign that the SUT is hard to use.

This is the reason why one of the most important messages of GOOS is: listen to your tests!

In the case of this question, my first reaction when seeing the proposed API of the Yahtzee game, and the discussion about combinatorics that can be found on this page, was that this is important feedback about the API.

Does the API have to represent dice rolls as an ordered sequence of integers? To me, that smell of Primitive Obsession. That's why I was happy to see the answer from tallseth suggesting the introduction of a Roll class. I think that's an excellent suggestion.

However, I think that some of the comments to that answer get it wrong. What TDD then suggests is that once you get the idea that a Roll class would be a good idea, you suspend work on the original SUT and start working on TDD'ing the Roll class.

While I agree that TDD is more aimed at the 'happy path' than it's aimed at comprehensive testing, it still helps to break the system down into manageable units. A Roll class sounds like something you could TDD to completion much more easily.

Then, once the Roll class is sufficiently evolved, would you go back to the original SUT and flesh it out in terms of Roll inputs.

The suggestion of a Test Helper doesn't necessarily imply randomness - it's just a way to make the test more readable.

Another way to approach and model input in terms of Roll instances would be to introduce a Test Data Builder.

Red/Green/Refactor is a three-stage process

While I agree with the general sentiment that (if you are sufficiently experienced in TDD), you don't need to stick to TDD rigorously, I think it's pretty poor advice in the case of a Yahtzee exercise. Although I don't know the details of the Yahtzee rules, I see no convincing argument here that you can't stick rigorously with the Red/Green/Refactor process and still arrive at a proper result.

What most people here seem to forget is the third stage of the Red/Green/Refactor process. First you write the test. Then you write the simplest implementation that passes all tests. Then you refactor.

It's here, in this third state, that you can bring all your professional skills to bear. This is where you are allowed to reflect on the code.

However, I think it's a cop-out to state that you should only "Write the simplest thing possible that isn't completely braindead and obviously incorrect that works". If you (think you) know enough about the implementation on beforehand, then everything short of the complete solution is going to be obviously incorrect. As far as advice goes, then, this is pretty useless to a novice.

What really should happen is that if you can make all tests pass with an obviously incorrect implementation, that's feedback that you should write another test.

It's surprising how often doing that leads you towards an entirely different implementation than the one you had in mind first. Sometimes, the alternative that grows like that may turn out to be better than your original plan.

Rigour is a learning tool

It makes a lot of sense to stick with rigorous processes like Red/Green/Refactor as long as one is learning. It forces the learner to gain experience with TDD not just when it's easy, but also when it's hard.

Only when you have mastered all the hard parts are you in a position to make an informed decision on when to deviate from the 'true' path. That's when you start forming your own path.

'nother TDD novice here, with all the usual misgivings about trying it. Interesting take on if you can make all tests pass with an obviously incorrect implementation, that's feedback that you should write another test. Seems like a good way to tackle the perception that testing the "braindead" implementations is needless busywork.
– shambulatorJan 20 '14 at 12:02

1

Wow thank you. I'm really scared by the tendency of people to tell beginners in TDD (or any discipline) to "don't worry about the rules, just do what feels best". How can you know what feels best when you have no knowledge or experience? I'd also like to mention the transformation priority principle, or that code should become more generic as tests become more specific. the most die-hard TDD supporters like uncle bob wouldn't stand behind the notion of "just add a new if-statement for every test".
– saraJun 14 '16 at 12:09

This is significant because not because of some TDD practice, but because hard-cording in all of those literals isn't really a good idea. One of the most difficult things to wrap your head around with TDD is that it isn't a comprehensive testing strategy -- it's a way to guard against regressions and mark progress while keeping the code simple. It's a development strategy and not a testing strategy.

The reason I mention this distinction is that it helps guide what tests you should write. The answer to "what tests should I write?" is "whatever tests you need to get the code the way you want it." Think of TDD as a way to help you tease out algorithms and reason about your code. So given your test and my "simple green" implementation, what test comes next? Well, you've established something that is a full house, so when isn't it a full house?

Now you have to figure out some way to differentiate between the two test cases that's meaningful. I personally would tack on a bit of clarifying information to the "do the simplest thing to make the test pass" and say "do the simplest thing to make the test pass that furthers your implementation." Writing failing tests is your pretext to alter the code, so when you go to write each test, you should be asking yourself "what doesn't my code do that I want it to do and how can I expose that deficiency?" It can also help you make your code robust and handle edge cases. What do you do if a caller inputs nonsense?

To sum up, if you're testing every combination of values, you're almost certainly doing it wrong (and likely to wind up with a combinatorial explosion of conditionals). When it comes to TDD, you should write the minimum amount of test cases necessary to get the algorithm you want. Any further tests that you write will start green and thus become documentation, in essence, and not strictly part of the TDD process. You'll only write further TDD test cases if the requirements change or a bug is exposed, in which case you'll document the deficiency with a test and then make it pass.

Update:

I started this as a comment in response to your update, but it started getting pretty long...

I'd say that the problem isn't with the existence of literals, period, but with the 'simplest' thing being a 5-part conditional. When you think about it, a 5-part conditional is actually pretty complicated. It will be common to use literals during the red-to-green step and then to abstract them to constants in the refactor step or else generalize them in a later test.

During my own journey with TDD, I came to realize that there is an important distinction to be made -- it's not good to confuse "simple" and "obtuse". That is, when I started out, I watched people do TDD and I thought "they're just doing the dumbest thing possible to make the tests pass" and I mimicked that for a while, until I realized that "simple" was subtly different than "obtuse". Sometimes they overlap, but often not.

So, apologies if I gave the impression that the existence of literals was the problem -- it isn't. I'd say the complexity of the conditional with the 5 clauses is the problem. Your first red-to-green can just be "return true" because that is truly simple (and obtuse, by coincidence). The next test case, with the (1, 2, 3, 4, 5) will have to return false, and this is where you start to leave "obtuse" behind. You have to ask yourself "why is (1, 1, 1, 2, 2) a full house and (1, 2, 3, 4, 5) isn't?" The simplest thing you could come up with might be that one has last sequence element 5 or second sequence element 2 and the other doesn't. Those are simple, but they're also (needlessly) obtuse. What you're really wanting to drive at is "how many of the same number do they have?" So you might get the second test to pass by checking to see whether or not there is a repeat. In the one with a repeat, you have a full house, and in the other you don't. Now the test passes, and you write another test case that has a repeat but isn't a full house to further refine your algorithm.

You may or may not do this with literals as you go, and it's fine if you do. But the general idea is growing your algorithm 'organically' as you add more cases.

Thank you very much for your thoughtful and well-explained answer. It actually makes a lot of sense now that I think about it.
– Kristof ClaesFeb 3 '13 at 20:43

1

Thorough testing doesn't mean testing every combination... That is silly. For this particular case, take a particular full house or two and a couple of non full houses. Also any special combinations that could cause trouble (ie, 5 of a kind).
– SchleisFeb 4 '13 at 14:46

Testing for five particular literal values in a particular combination is not "simplest" to my fevered brain. If the solution to a problem really is obvious (count whether you have exactly three and exactly two of any value), then by all means go ahead and code that solution, and write some tests that would be very, very unlikely to satisfy accidentally with the amount of code you wrote (i.e. different literals and different orders of the triples and doubles).

TDD maxims really are tools, not religious beliefs. Their point is to get you to write correct, well-factored code rapidly. If a maxim obviously stands in the way of that, just jump ahead go on to the next step. There will be plenty of non-obvious bits in your project where you can apply it.

If you do this approach you will need to improve your log messages in your Assert.That statements. The developer needs to see which input caused the failure.
– Bringer128Feb 3 '13 at 23:20

Does this not create a Chicken-or-the-egg dilemma? When you implement AnyFullHouse (using TDD as well) wouldn't you need IsFullHouse to verify its correctness? Specifically, if AnyFullHouse has a bug, that bug could be replicated in IsFullHouse.
– waxwingFeb 4 '13 at 8:43

AnyFullHouse() is a method in a test case. Do you typically TDD your test cases? No. Also, it is much simpler to create a random exemplar of a full house (or any other roll) than it is to test for it's existence. Of course, if your test has a bug, it could be replicated in the production code. That's true of every test though.
– tallsethFeb 4 '13 at 12:44

AnyFullHouse is a "helper" method in a test case. If they are general enough helper methods get tested too!
– Mark HurdFeb 6 '13 at 2:09

Add "some" more test cases (~5) of valid full-house sets, and the same amount of expected falses ({1, 1, 2, 3, 3} is a good one. Remember that 5 ones for example could be recognized as "3 of the same plus a pair" by an incorrect implementation). This method assumes the developer is not just trying to pass the tests, but actually implement it correct.

Test all possible sets of dice (there are only 252 different ones). This of course assumes you have some way to know what the expected answer is (in testing this is known as an oracle.) This could be a reference implementation of the same function, or a human being. If you want to be really rigorous it could be worth it to manually code each expected outcome.

As it happens, I've written a Yahtzee AI once, which of course had to know the rules. You can find the code for the the score-assessing part here,
please note that the implementation is for the Scandinavian version (Yatzy), and our implementation assumes the dice are given in sorted order.

The million dollar question is, did you derive the Yahtzee AI using pure TDD? My bet is that you can't; you have to use domain knowledge, which by definition isn't blind :)
– Andres F.Feb 3 '13 at 22:31

Yeah, I guess you're right. This is a general problem with TDD, that the test cases need expected outputs unless you only want to test for unexpected crashes and unhandled exceptions.
– ansjobFeb 4 '13 at 7:35

This example really misses the point. We're talking about a single straightforward function here not a software design. Is it a bit complicated? yes, so you break it down. And you absolutely do not test every possible input from 1, 1, 1, 1, 1 to 6, 6, 6, 6, 6, 6. The function in question doesn't require order, just a combination, namely AAABB.

You don't need 200 seperate logic tests. You could use a set for example. Nearly any programming language has one built in: