Is it possible to have too much data in a unit test?

I recently came across a unit test that looked like this (note that this code has been re-constructed by me to protect the guilty):

I have cut the list short, but there were around 75 such test data items. Crikey!

I can understand why the particular set of test samples was chosen: it was dumped from the production system and chosen to give a representative set of test data.

But this test code was in a unit test. And the behaviour being tested (in this case it was some statistical analysis) didn’t strictly require that many samples. Three samples would have sufficed.

So why is having the additional test data a problem? In general, it’s a productivity issue. Tests with lots of data take longer to write and to work out what the correct answer should be. It makes it harder to maintain. It makes it harder to debug.

It even makes the test class harder to browse because in this case, I needed to scroll past five separate sets of such data before finding the set I was looking for! (No, I don’t like regions, either, before you ask…)

I think the test data for a unit test should be the minimum required to fulfill the test: no more, no less.

Good spot (I recognise my guilty hand!), although in my defense it was there to test that the stats were worked out correctly on a long series of data that rose and fell many times rather than a simple “test that this is worked out in this very specific situation” test.

I realise that there were several copies of the data that weren’t required but I would argure that at least one test was required and falls in an unnamed area between unit and integration tests where you don’t want to link it in with other parts of the system so not technically an integration test but do want to test with real data so, as you say, not really a unit test. Do we need a new name?

Another problem is that with files this big the unit test runner has to recompile the whole class completely from scratch and resharper is extremely slow to parse the class with every change you make.

I think probably the decision to include a larger range, perhaps including live data, to make the test more realistic would depend precisely on what is being tested, but with a definite bias towards “fewer is better if the result is the same”.

As a very simple example, if the purpose of a certain class would be to calculate the mean of a set of samples, I would say that three samples would be enough for me to be confident that the code was indeed calculating the mean. I would not need any additional data to be more sure of that, unless it was a specific test that was concerned, say, with rounding.

The decision to include more also gives rise to the question of what makes data ‘real’, because the system (and data) will evolve over time, or may also be a Greenfield system where no live data exists at time of development.

As a general rule of thumb, I think the number of samples chosen should be the minimum number to show beyond doubt that the result is correct as specified by the test. Beyond that, the law of diminishing returns kicks in and productivity starts to suffer in many ways – with Resharper even slowing down as you point out!