Friday, 15 January 2016

Man Cannot Live by Unit Testing Alone

Way back in May 2012 I wrote a blog post titled “Beware the Complacency Unit Testing Brings”. This was a reaction to a malaise that I began to see developing as the team appeared to rely more heavily on the feedback it was getting from unit tests. This in turn appeared to cause some “trivial” bugs that should also have been picked up early, to be detected somewhat later.

This post looks at a couple of other examples I’ve seen in the past of problems that couldn’t have be solved by unit testing alone.

Unit Tests Are Self-Reinforcing

Myself and a colleague once had a slightly tortuous conversation with a project manager about our team’s approach to testing. There was a “suggestion” that as the organisation began to make more decisions based on the results of the system we had built, the more costly “a mistake” could become. We didn’t know where this was coming from but the undertone had a suggestion about it of “work harder”.

Our response was that if the business was worried about the potential for losses in millions due to a software bug, then they should have no problem funding a few tens of thousands of pounds of hardware to give us the tools we need to automate more testing. To us, if the risks were high, then the investment should be too, as this helps us to ensure we keep the risks down to a minimum. In essence we advocated working smarter, not harder.

His response was that unit tests should be fast and easy to run, and therefore he questioned why we needed any more hardware. What he failed to understand about unit testing was its self-reinforcing nature [1]. Unit tests are about a programmer verifying that the code they wrote works as they intended it to. What it fails to address is that it meets the demands of the customer. In the case of an API “that customer” is possibly just another developer on the same team providing another piece of the same jigsaw puzzle.

As if to prove a point this scenario was beautifully borne out not long after. Two developers working on either side of the same feature (front-end and back-end) both wrote their parts with a full suite of unit tests and pushed to the DEV environment only to discover it didn’t work. It took a non-trivial amount of time of the three of us (the two devs in question and myself) before I happened to notice that the name of the configuration setting which the front-end and back-end were using was slightly different. Each developer had created their own constant for the setting name, but the constant’s value was different and hence the back-end didn’t believe it was ever being provided.

This kind of integration problem is common. And we’re not talking about junior programmers here either, both were smart and very experienced developers. They were also both TDD-ers and it’s easy to see how this kind of problem occurs when your mind-set is focused around the simplest thing that could possibly work. We always look for the mistake in our most recent changes and both of them created the mismatched constant right back at the very beginning, hence it becomes “out of mind” by the time the problem is investigated [2].

Performance Tests

Unit tests are about verifying functional behaviour, so ensuring performance is not in scope at that point. I got a nice reminder of this not long afterwards when I refactored a stored procedure to remove some duplication, only to send performance through the roof. The SQL technique I used was “slightly” less performant (I later discovered) and it added something like another 100 ms to every call to the procedure.

Whilst all the SQL unit tests passed with flying colours in it’s usual timescale [3], when it was deployed into the test environment, the process it was part of nosedived. The extra 100 ms in the 100,000 calls [4] that the process made to the procedure started to add up and a 30 minute task now took over 8 hours!

Once again I was grateful to have “continuous” deployments to a DEV environment where this showed up right away so that I could easily diagnose and fix it. This just echoes what I wrote about recently in “Poor Performance of log4net Context Properties”.

A Balance

The current backlash against end-to-end testing is well justified as there are more efficient approaches you can take. But we must remember that unit testing is no panacea either. Last year we had these two competing views going head-to-head with each other: Why Most Unit Testing is Waste and Just Say No to More End-to-End Tests. It’s hard to know what to do.

As always the truth probably lies somewhere in between, and shifts either way depending on the kind of product, people and architecture you’re dealing with. The testing pyramid gets trotted out as the modern ideal but personally I’m still not convinced about how steep the sides of it should be for a monolith versus a micro-service, or a thick client versus a web API.

What I do know is that I find value in all different sorts of tests. One size never fits all.

[1] This is one of the things that pair and mob programming tackles because many eyes help make many kinds of mistakes less common.

[2] Yes, I could also go on about better collaboration and working outside in from a failing system test, but this didn’t deserve any massive post mortem.

[3] Database unit tests aren’t exactly speedy anyway so they increased the entire test suite time by an amount of time that could easily have been passed off as noise.

I certainly second "I find value in all different sorts of tests." No single testing approach is going to give you everything you want from testing--reliable components, conformance with business requirements, suitable for users, performance, avoiding costly rework cycles, finding mistakes--confidence in our product. What one approach misses, we use another to cover the gap.

I'm a proponent of testing everything at the lowest level possible. Yes, that means relying on comprehensive unit tests to make sure the code does what the programmer intends it to do. It also means checking at the component level that the component, as a whole, works according to its API and usage. This is simplified when we're not simultaneously checking boundary conditions, having done that in the unit tests.

It also means checking the integration boundaries. I prefer to do much of this testing at the boundary, itself. I go into more detail on how I do this in the External Dependencies chapter of Evolutionary Anatomy of Test Automation Code (https://leanpub.com/EvolutionaryAnatomy/read_sample). This makes sure that the integration, itself, is functional. Then a test or two encompassing both components (or perhaps the entire system) can verify that the integration fits the usage expectations.

I find this thought process helpful for producing reliable systems on which I can have confidence. Start with a test for the expectation that's driving the work on the system, whether it's new functionality or fixing a defect. Test as low as possible. Consider the aspects that are not covered at that level, and add wider-scale tests to cover those gaps. When I'm done, I can visualize my work with the testing pyramid. I don't, however, find the pyramid very helpful while working to decide what test I need to add next. Having the "right number" of tests at a particular level is more important than having "the right tests."

So watch those system boundaries! I haven't watched your video to see what you mean by "database unit tests." I do write tests around the data access layer, often using the External Dependencies model mentioned above. These I consider integration tests, as the database is, in my view, an external dependency and most of my tests work through the adapter I've built to translate to from domain language to the language of database access. Sometimes I work at the unit level within the data access layer.