Triggered by this thread, I (again) am thinking about finally using unit tests in my projects. A few posters there say something like "Tests are cool, if they are good tests". My question now: What are "good" tests?

In my applications, the main part often is some kind of numerical analysis, depending on large amounts of observed data, and resulting in a fit function that can be used to model this data. I found it especially hard to construct tests for these methods, since the number of possible inputs and results are too large to just test every case, and the methods themselves are often quite longish and can not be easily be refactored without sacrificing performance. I am especially interested in "good" tests for this kind of method.

Any good unit test should only test one thing - if it fails you should know exactly what went wrong.
–
gablinNov 24 '10 at 11:47

When having large amounts of data the good thing is to write generic tests that can take data files as input. Data files should typically contain both input and expected result. With xunit test frameworks you can generate test cases on the fly - one for each data sample.
–
froderikNov 5 '11 at 8:43

@gablin "If it fails you should know exactly what went wrong" would suggest that tests with multiple possible failure causes are okay, as long as you can determine the cause from the test's output...?
–
immibisJun 14 at 9:03

and then later adds it should be fully automated, trustworthy, readable, and maintainable.

I would strongly recommend reading this book if you haven't already.

In my opinion, all these are very important, but the last three (trustworthy, readable, and maintainable) especially, as if your tests have these three properties then your code usually has them as well.

+1 for the link. Interesting material to be found there.
–
Joris MeysFeb 11 '11 at 14:01

"Run quickly" has big implications. It is one reason why unit tests should run in isolation, away from external resources such as the database, file system, web service, etc. This, in turn, leads to mocks/stubs.
–
Michael EasterApr 7 '14 at 18:09

As a greatly simplified example, consider you have a function that returns an average of two int's. The most comprehensive test would call the function and check if a result is in fact an average. This doesn't make any sense at all: you are mirroring (replicating) the functionality you are testing. If you made a mistake in the main function, you will make the same mistake in the test.

In other words, if you find yourself replicating the main functionality in the unit test, it's a likely sign that you are wasting your time.

Create tests for corner cases, like an test set containing only the minimum number of inputs (possible 1 or 0) and a few standard cases. Those unit tests are not a replacement for thorough acceptance tests, nor should they be.

I've seen lots of cases where people invest a tremendous amount of effort writing tests for code that is seldom entered, and not writing tests for code that is entered frequently.

Before sitting down to write any tests, you should be looking at some kind of a call graph, to make sure you plan adequate coverage.

Additionally, I don't believe in writing tests just for the sake of saying "Yeah, we test that". If I'm using a library that is dropped in and will remain immutable, I'm not going to waste a day writing tests to make sure an API that will never change works as expected, even if certain parts of it score high on a call graph.

but what at a later date when the library has a newer version with a bug fix?
–
user1249Nov 24 '10 at 11:47

@Thorbjørn Ravn Andersen - It depends on the library, what changed and their own testing process. I'm not going to write tests for code that I know works when I dropped it in place, and never touch. So, if it works after updating, out of mind it goes :) Of course there are exceptions.
–
Tim Post♦Nov 24 '10 at 15:28

if you depend on your library, the least you can do is to write tests that show what you expect said library to actually do,
–
user1249Nov 24 '10 at 22:01

Not quite so TDD, but after you have gone into QA you can improve your tests by setting up test cases to reproduce any bugs that come up during the QA process. This can be particularly valuable when you're going into longer term support and you start getting to a place where you risk people inadvertantly reintroducing old bugs. Having a test in place to capture that is particularly valuable.

for TDD, "good" tests test features that the customer wants; features do not necessarily correspond to functions, and test scenarios should not be created by the developer in a vacuum

in your case - i'm guessing - the 'feature' is that the fit function models the input data within a certain error tolerance. Since I have no idea what you're really doing, I'm making something up; hopefully it is analgous.

Example story:

As a [X-Wing Pilot] I want [no more than 0.0001% fit error] so that [the targeting computer can hit the Death Star's exhaust port when moving at full speed through a box canyon]

So you go talk to the pilots (and to the targeting computer, if sentient). First you talk about what is 'normal', then talk about the abnormal. You find out what really matters in this scenario, what is common, what is unlikely, and what is merely possible.

Let's say that normally you'll have a half-second window over seven channels of telemetry data: speed, pitch, roll, yaw, target vector, target size, and target velocity, and that these values will be constant or changing linearly. Abnormally you may have less channels and/or the values may be changing rapidly. So together you come up with some tests such as:

//Scenario 1 - can you hit the side of a barn?
Given:
all 7 channels with no dropouts for the full half-second window,
When:
speed is zero
and target velocity is zero
and all other values are constant,
Then:
the error coefficient must be zero
//Scenario 2 - can you hit a turtle?
Given:
all 7 channels with no dropouts for the full half-second window,
When:
speed is zero
and target velocity is less than c
and all other values are constant,
Then:
the error coefficient must be less than 0.0000000001/ns
...
//Scenario 42 - death blossom
Given:
all 7 channels with 30% dropout and a 0.05 second sampling window
When:
speed is zero
and position is within enemy cluster
and all targets are stationary
Then:
the error coefficient must be less than 0.000001/ns for each target

Now, you may have noticed that there's no scenario for the particular situation described in the story. It turns out, after after talking with the customer and other stakeholders, that goal in the original story was just a hypothetical example. The real tests came out of the ensuing discussion. This can happen. The story should be rewritten, but it doesn't have to be [since the story is just a placeholder for a conversation with the customer].

I try to have every test only test one thing. I try to give each test a name like shouldDoSomething(). I try to test behaviour, not implementation. I only test public methods.

I usually have one or a few tests for success, and then maybe a handfull of tests for failure, per public method.

I use mock-ups a lot. A good mock-framework would probably be quite helpfull, such as PowerMock. Although I'm not using any yet.

If class A uses another class B, I'd add an interface, X, so that A doesn't use B directly. Then I'd create mock-up XMockup and use it instead of B in my tests. It really helps speeding up test execution, reducing test complexity, and also reduces the number of tests I write for A since I don't have to cope with the peculiarities of B. I can for example test that A calls X.someMethod() instead of a side effect of calling B.someMethod().

Keep you test code clean as well.

When using an API, such as a database layer, I'd mock it and enable the mock-up to throw an exception at every possible opportunity on command. I then run the tests one without throwing, and the in a loop, each time throwing an exception at the next opportunity until the test suceeds again. A bit like the memory tests available for Symbian.

I see that Andry Lowry has already posted Roy Osherove's unit test metrics; but it seems no one has presented the (complimentary) set that Uncle Bob gives in Clean Code (132-133). He uses the acronym FIRST (here with my summaries):

Fast (they should run quickly, so people won't mind runing them)

Independent (tests should not do setup or teardown for one another)

Repeatable (should run on all environments/platforms)

Self-validating (fully automated; the output should be either "pass" or "fail", not a log file)

Timely (when to write them—just before writing the production code they test)