If at first you don't succeed, call it version 1.0

Tag Archives: testing

Unit tests, as we all know, are A Good Thing™. It stands to reason then that 100% unit test coverage must also be A Good Thing™, and therefore is something that should be strived for at all costs. I’d argue this is wrong though.

100% unit1 test coverage can be exceptionally difficult to achieve in real world applications without jumping through a whole load of hoops that end up being counterproductive. The edges of your system often talk to other systems which can be difficult to isolate or mock out. You can often be left chasing that last few percent and making counterintuitive changes to achieve them. At this point I’d say it’s better to leave the clean, readable, untested code in and just accept 100% coverage isn’t always possible.

This leads to another problem though. Once you’re not hitting 100% coverage you need to be sure that code that isn’t covered is actually code you can’t cover. As your code base gets bigger the amount a single missed line of code affects your coverage gets smaller.

PHPUnit takes a pragmatic approach to this issue; it allows you to mark blocks of code as being untestable. The net result is that it simply ignores these blocks of code in its coverage calculations allowing you to get 100% coverage of testable code.

Quite a few people who I’ve told about this have declared this to be ‘cheating’, however, lets look at a very real issue I have in one of my bits of Java code. I have code that uses the list of locales that can be retrieved by Java. It uses the US as default since it’s reasonable to assume that the US locale will be set up on a system. While highly improbably, it’s not impossible for a system to lack the US locale and the code handles this gracefully. Unit testing this code path is impossible as it involves changes to the environment. I could conceivably handle this in functional tests, but it’s not easy. I could remove the check and just let the code fall over in a crumpled heap in this case, but then if it ever does happen someone is going to have a nasty stack trace to deal with rather than a clear and concise error message.

If I could mark that single block of code as untestable I would then be able to see if the rest of my code was 100% covered at a glance. As it is I’ve got 99 point something coverage and I need to drill into the coverage results to ensure the missing coverage comes from that one class. Royal pain in the behind.

I am willing to concede that the ability to mark code as untestable can be abused, but then that’s what code reviews are for. If someone knows how to unit test a block of code that’s marked as untestable they can fail the review and give advice to the developer that submitted the code.

1 People seem to get unit and functional tests muddled up. A unit test should be able to run in isolation with no dependency on environment or other systems. Functional tests, on the other hand, can make assumptions about the environment and can require that other systems are running and correctly configured. The classic example is a database. Unit tests should mock out the database connection and used canned data within the unit tests. Functional tests can connect to a real database pre-populated with test data. Typically functional tests also test much larger blocks of code than individual unit tests do.

For impartiality here I should probably point out (if you hadn’t worked it out already) that I’m a rabid Apple Fanboi and love most1 things they do.

Falling as I do in the anti Windows camp it’s little wonder I haven’t done any .Net development2. Personally I don’t have anything against .Net, from what I understand C# is just Java written with the benefit of hindsight, which can only be A Good Thing™, plus there is plenty that you can learn from the .Net camp that applied to the broader world of programming. It does amuse me somewhat, however, that the two .Net centric talks we’ve had so far at NorDev have been given on Macs – albeit running windows.

Yesterdays talk was by Simon Elliston Ball3 on Glimpse, a very funky looking debugging tool for .Net web developers which I really, really wished existed for Java developers as I could seriously do with a tool like that. Glimpse is open source and well documented so I would recommend you go check it out. It’s also very extensible so if you fancy writing a Java port for it I’d be eternally grateful.

Our second speaker, Phil Nash, also used a Mac, but that’s hardly surprising as he was giving a talk on TDD and iOS development, something that’s not going to work on anything else. After a brief introduction into Objective-C, which is a funny old language, we were then shown some techniques to effectively use TDD when writing iOS (or in fact any Objective-C app) with some live coding examples – something I always enjoy watching. Interestingly, 100% of all NorDev talks have ended with someone called Phil live coding on a Mac. You may argue that a sample size of 2 is not statistically significant but it still doesn’t stop it being fact 🙂

1 Im not a complete fanatic and will admit there are some things they’ve done wrong, for example: mice. Apple are a company that seem incapable of making a good mouse. Trackpads they can do; mice, they suck at. I get my mice from Razer. They know how to make mice.

2 Yes, I know there’s things like Mono which means I can code and run it on other platforms, but… faff.

3 Elliston Ball is a double-barrelled non-hyphenated surname – can your code cope with that? Not entirely sure all of ours can. There’s a lesson to be learned there 🙂

I’ve run into a bit of a brick wall with Tumbler in so far as I think I’m using it at far too low a level. I’ve got a fairly simple object at the moment, little more than a bean, which I’m using as a working example to try these new techniques out. While my first story and group of scenarios were easy to write I started running into issues with the second group. The issues are twofold. Firstly I’m having to learn to rethink how I group my tests to fit into stories and scenarios. While working this through I started butting into problems with long class and method names which I can’t really shorten as there will, eventually, be literally hundreds of tests and I need to be able to distinguish between them.

After fiddling about with different ways of framing the stories and scenarios I discovered that its annotations aren’t picked up by the JUnit plugin for Eclipse so I can’t rely on them to make readable tests, I have to use readable class and method names.

Then, I discovered that Tumbler isn’t hierarchical. Stories are listed on the index page, then you can drill down into a story and see it’s scenarios. That’s it. If I had 100 stories I’d have to wade through all of them on the front page. What I need is epics.

This all rather makes sense for a tool that’s going to be used at a higher level, detailing 20 or 30 user stories that constitute an application, but I want the ability to test at different levels. After all, as I understand it, BDD can be considered fractal in its nature and is as easily applied to a users interaction with a save dialog box as it is to the save method call on some object somewhere. Yes, the players change, and yes the granularity and precision of the inputs and outputs change, but it’s the same fundamental thought process when developing the tests.

In order to shed some light on the issue I tool a look at the Tumbler source, specifically the test cases, but they were all at the user level. Tumbler itself isn’t that complicated a program so it may be that these user stories suffice for testing the majority of the code, but I want to know at an object level that they do what they say on the tin.

Sadly, the majority of this discovery has been performed on the train with its connectivity issues so performing research into alternative tools is proving hard. That said, given Tumbler isn’t massively complex I may just put my current project to one side, fork that and get it to work at both the low and high levels. In the mean time it seems like I need to do more research.

I should probably start by pointing out that this post is more to help me get something straight in my head than anything else. It’s also covering a subject that I’m not sure I fully understand, so I may have completely missed the point.

One of the things that I was most interested in at Sync Conf was Behaviour Driven Development (BDD). I’ve never been great at following Test Driven Development (TDD), mainly because I couldn’t make the shift required to fully get my head round it. What I practiced, if it has a name, was Javadoc Driven Development; each method would have descriptive Javadoc that defined what happened on various inputs. I found that by doing this I built up a narrative of how a class would work and that provided me with concrete test examples.

This method of testing only really works if you write decent Javadoc. The following is next to useless, and on many levels:

/**
* Sets the name.
*
* @param name the name
*/
public void setName(String name);

What happens if I pass in null? What happens if I call the method twice? Does passing a blank string cause issues? I’ve heard so many arguments that Javadoc is pointless and you should just read the code, after all, it’s just going to get out of date. I find that attitude utterly reprehensible; the behaviour of the method could rely on something that it’s calling and that could go more than one level deep. I don’t want to have to dive through reams of code to understand if what I’m calling does what I expect, nor do I want to have to guess. For me the Javadoc provides a contract for the method in a narrative form which is then implemented by the code. Change the logic of the code and you have to change the Javadoc, and I’d actually change the Javadoc first.

The mindset I try and put myself in is not “how will this method work logically”, but more “how will this method be used and how does it fit in with what the object is trying to do”. In the above example we’re setting a name. Does it make sense for a name to be null, or blank? If we can have an empty name how is that represented? null, empty string, or both? Is a blank string treated the same as an empty string? Should we trim the string? Are there other things we’re going to limit? You then document the answers to those questions:

/**
* Sets the name to the given value, overwriting any previous value that was
* set. Passing <code>null</code> into this method will have the effect of
* clearing the name. Passing a blank or empty string will have the effect
* of clearing the name by setting it to <code>null</code>.
*
* @param name the value to set the name to
*/
public void setName(String name);

All of a sudden you’ve got a load of things you need to test. I’d be writing tests to ensure that:

Given a null value, providing a value A set the value to A.

Given a null value, providing a null value retained the null value.

Given a null value, providing a blank value retained the null value.

Given the value A, providing a value A retained the value A.

Given a value A, providing value B set the name to that value.

Given a value A, providing a null set the name to null.

Given a value A, providing a blank string set the name to null.

Those of you familiar with BDD may already be seeing something familiar here. What I have been doing is something very similar to BDD, albeit in an informal and haphazard way and at a very low level. I was defining functionality as a narrative in the Javadoc with my user in mind (another developer), and then using that to discover the scenarios that I should test. I was actually so used to this method of working that when I used Tumbler to test one of the classes I was working on it already felt natural and the tests practically wrote themselves. Interestingly enough the conventions used by BDD and Tumbler freed me from one of the constraints I was facing with testing; that of my test classes were getting too big. I will still structuring my tests classes as the mirror of my code, so for any object Foo there was a FooTest. By thinking in stories for the tests too and grouping scenarios that belong to a story I could break out of this habit and have as many classes as I needed testing the behaviour of Foo.

Happy with my new test scenario, and the output that Tumbler had produced, I proceeded to run Sonar against it. Sonar did not like the result. Most of what it complained about no longer matters. I can turn off the requirement for Javadoc on all methods because my test methods contain the scenario in the code and don’t require it elsewhere. The need for a descriptive string in my assertions can also be turned off as the way the tests are displayed by Tumbler provide a much more natural way of reading the test results. One critical marker stood out though: 0% code coverage.

It took me all of 30 seconds to work out what was going on. Tumbler uses its own JUnit test runner, produces its own reports and isn’t spitting out the files that Sonar is looking for and so there’s nothing for it to report on. This may or may not be something I can fix, although Google is yielding nothing on the subject. This got me to thinking: Do I need to know my coverage? Surely if I’m defining the behaviour of the class, then writing the code for that then I’ve pretty much got 100% coverage since I shouldn’t be writing code that produces behaviour that hasn’t been defined. This is where I got stuck.

Liz Keogh, who gave the Sync Conf BDD talk mentioned that BDD didn’t work so well for simple or well understood scenarios. Should I be using TDD here? That way I’d get my test coverage back, but I lose my new way of working. Finally, after much Googling I can across this blog and came to the realisation that I’m coming at this all wrong: there is no spoon. What I think Liz meant was that BDD at the level of the business helping to write the scenarios isn’t useful for well understood scenarios, because they’re well understood and anyone can write them, not that we just give up on BDD and go do something else… or maybe she did, but if I’m understanding Hadi correctly then using TDD instead of BDD is just a semantic shift and I could just as easily use BDD in TDD’s place.

We all know that 100% test coverage means nothing, and that chasing it can end up an exercise in pointlessness. Then there’s the farcical scenario where you have 100% test coverage, all tests running green and software that doesn’t work. So why the panic over not knowing my test coverage? I think it boils down to the Sonar reports let me browse the code, see whats not tested and then think up tests to cover that code. In other words chasing 100% (or 90% or 80%) and writing tests for testings sake. If I’m doing BDD properly then I’ll be thinking up the narratives, determining the scenarios and then writing the code to meet those scenarios. If my code coverage isn’t above 80% (which is a level I consider to be generally acceptable) then I’m doing something wrong as there is code and code paths not covered by scenarios which is, in theory, pointless code.

So how do I solve my Sonar problem? Simple, unhook the alerts on test coverage, remove the test coverage widget and keep my eye out for a plugin for Tumbler reports. In the mean time I can just use the reports generated by Tumbler to keep an eye on my tests and make sure they’re running clean and read up on getting my Maven builds to fail when Tumbler has a failed test.