Monday, 27 September 2010

I’ve decided to be brave and submit a session proposal for next year’s ACCU Conference. I know if I get accepted that means I’ll have to curtail the relentless late nights in the bar – at least until I’ve given my presentation. Anyway, the catalyst for my proposal was my blog back in March titled Integration Testing with NUnit. Not long after posting that I watched Paul Grenyer give a presentation to the London branch of the ACCU that touched on similar ground. My session is intended to take the idea further still and cover use of the xUnit style in other areas where perhaps custom test scripts and tools might have been written such as API exploration and verifying external systems.

Chris started out as a bedroom coder in the 80s writing assembler on 8-bit micros; these days it’s C++ and C# on Windows in big plush corporate offices. His career has covered both shrink wrapped applications and in-house systems with the past 5 years focusing on grid-based distributed systems in the Finance industry. When not attached to a keyboard and screen he has a wife and four children to entertain, dips his toe in the local swimming pool and provides the commentary for the annual Godmanchester Gala Day Duck Race.

Session Description

Modern Unit Testing practices act as a conduit for improved software designs that are more amenable to change and can be easily backed by automation for fast feedback on quality assurance. The necessity of reducing external dependencies forces us to design our modules with minimum coupling which can then be leveraged both at the module, component and subsystem levels in our testing. As we start to integrate our units into larger blocks and interface our resulting components with external systems we find ourselves switching nomenclature as we progress from Unit to Integration testing. But is a change in mindset and tooling really required?

The xUnit testing framework is commonly perceived as an aid to Unit Testing but the constraints that it imposes on the architecture mean that it is an excellent mechanism for invoking arbitrary code in a restricted context. Tests can be partitioned by categorisation at the test and fixture level and through physical packaging leading to a flexible test code structure. Throw in its huge popularity and you have a simplified learning curve for expressing more that just unit tests.

Using scenarios from his current system Chris aims to show how you can use a similar format and tooling for unit, component and integration level tests; albeit with a few liberties taken to work around the inherent differences with each methodology.

Friday, 10 September 2010

How often have you come across this kind of code in a stored procedure:-

select abc from xyz where Status = 3

…or if you’re lucky (or perhaps unlucky if the comment is out-of-date):-

select abc from xyz where Status = 3 --Failed

One of the downsides of SQL (at least with T-SQL) is a lack of support for constants, and by extension enumerations.

No Magic Numbers

It’s long been a best practice that you avoid the use of so called Magic Numbers like the ‘3’ in the examples above in favour if some more symbolic name. This aids both readability and maintainability because you don’t have to go and lookup what the number represents (in the case of an enumeration) and if the enumeration value needs to change you only have to change the relationship with the symbol and not the code where the symbol is referenced.

Local Constants

If you only need a constant for use within a single stored procedure then you can just use a normal variable:-

This will make the latter SQL more readable and hopefully more maintainable - it should be far easier to grep for “Failed” than “3” if the numeric value does need to change[*].

Global Constants

Sadly the need for one-off constants is pretty rare. What is more likely is the need for a globally defined constant so that we don’t violate the DRY/SPOT principle by defining local constants in every procedure where it’s used. The best solution I could come up with[#] was to use a User Defined Function:-

create function Failed() returns int as return 3 go

This means that you can use it like so:-

select abc from xyz where Status = Failed()

However it’s not always possible to use a UDF in the same place that you can a constant or variable (I’ve yet to read up on why) and so sometimes in a stored procedure you still need to use a local variable as well:-

Although some databases have support for schemas, they are often quite simplistic and not designed for nesting like you would in C++ and C#. This means that to avoid conflicts we need to resort to the old style of using some form of prefix or suffix on the name:-

select abc from xyz where Status = TaskStatus_Failed()

This becomes a little less distracting with enumerations as in C# you are forced to prefix the enumeration value with it’s type name anyway (unlike C++, although may people nest C++ enumerations within a separate struct to get the same effect) so it’s not too unnatural to read. Of course you could still create a separate namespace, called say ‘constants’, in which you put all these UDFs if you felt that it helped matters.

An Alternative - Lookup Tables

The main thing I like about the UDF solution is that the function name means the symbol appears as code rather than data. One alternative I’ve seen is to use a small lookup table that has the integer value and a string for the symbol. This makes the numbers less ‘magic’ but it means you have invoke a query to get the enumeration value:-

I’ll be honest and admit that I’ve done no performance comparisons between this and the UDF idea. It also still feels to me like you’re expressing the enumeration with data rather than code though but that’s probably nonsense.

Client-Side Use

The other benefit of using a UDF is that you can also use it client side should you happen to need to embed queries within your code so your magic numbers don’t leak further afield. We use proper enumerations in our C# code that are kept in sync manually but tested automatically as part of a build. Ideally we should generate the UDF’s and C# enum’s from a single source, but for the moment the automated tests acts as a nice safety net.

[*] Of course in the real world you would never rely on the consistent use constants like this and would actually search for all references to the table and/or column name instead, but hey this is theory :-)

[#] Remembering that I’m not a SQL expert by any stretch of the imagination.

“Exceptions are often overused. Because they distort the flow of control, they can lead to convoluted constructions that are prone to bugs. It is hardly exceptional to fail to open a file; generating an exception in this case strikes us as over-engineering. Exceptions are best reserved for truly unexpected events, such as file systems filling up or floating-point errors.”

My gut reaction to that was “No it’s not – it depends on context”…

External Resources Always Fail

Messer’s Kernighan & Pike are obviously far more knowledgeable and experienced than me so why did they make such a bold statement? Generalising somewhat, I presume that they were suggesting that any interaction with an external resource should be handled with due care as you should expect it to fail – after all we always encounter problems with files, sockets, databases etc. And as a general rule this makes perfect sense.

Full Environmental Control

But… In the world of in-house server-side software you are often completely in control of the entire environment in which your code runs. Even for in-house desktop software the environment may be a little more hostile but you can still ensure certain invariants. Taking the “opening a file” example again and given the following conditions, would failing to open a file still be considered unexceptional?

The file exists (say we just created it and control its lifetime)

The permissions on the folder are correct (our server installation script set them)

The server has plenty of internal resources (memory, handles etc)

The process has not suffered any unrecoverable errors

In this situation I would be on the side of it being exceptional. In my experience failures under these kinds of scenarios usually point to a system configuration error [1 or 2], which means it would never have worked, or the server/process itself has become unstable [3 or 4] due to some earlier (and hence unexpected) issue.

Recoverability

Of course the issue is perhaps moot as all errors fall into two categories – recoverable and unrecoverable. In the scenario described above the recovery is effectively manual[*] – fix the system configuration or restart the process/server. In the real world I can’t see how you could justify the cost of designing a system to automatically cope with some/all of those potential edge cases[+]. Even the last one (process instability) can be hard to handle fully via process recycling because too much badly written code masks nasty errors like Out Of Memory or Access Violations by sinking everything so you don’t get to react to it.

In contrast the times where I can see you might be able to recover from failing to open a file are when the choice of file is controlled by the user, such as in a desktop application, or when the contents of the file can be considered optional, such as user settings.

Applying the issue wider afield to databases and remote services we end up at clustering/replication as the obvious recovery option. Even so, given the general reliability of in-house networks and (non-blade) hardware these days, I would still consider it pretty exceptional for a failure to occur. By-and-large misconfiguration is the most common reason I see for failures involving resources.

Do Or Do Not (Or There May Be a Try)

The chapter of The Practice of Programming in question was about designing interfaces so do we assume that this message is aimed more at the writers of libraries rather than applications? Given that you don’t know the context in which your code is being called do you have to choose between error codes or exceptions? Would your interfaces be better or worse if you somehow supported both so that the caller could decide? I’m sure there are many developers out there right now wincing as they stare at their legacy COM heavy codebases that have a mixture of wrappers generated by Visual C++’s #import feature where some methods throw and others return HRESULTs, some have a Chk prefix and others don’t. What I’m suggesting has more moderation to it…

In the .Net world there is a common pattern where methods by default succeed or throw an exception and a parallel set of methods named TryXxx that return a bool and use an output parameter instead. The latter ‘overloads’ allow you to avoid the cost[#] of handling an exception such as when parsing numbers and DateTimes that you can expect to be iffy (format wise) but they will still throw other exceptions which are orthogonal to the task such as Out of Memory.

Was It Just a Poke At Java?

Or maybe I’m just being too literal? The book was written over 10 years ago (back when exceptions where still not exactly mainstream) and it follows an example in Java so perhaps they were just having a pop at the choice Java had made. After all, at that time Java was still popular as a client-side technology – even outside the browser.

The fact that people like Matthew Wilson are still regurgitating this argument a decade later means there is still much to learn. I think the TryXxx pattern is a nice compromise as you get the flexibility of handling a domain failure without resorting to an unnecessary try/catch block and you also don’t accidentally mask orthogonal failures by ignoring the return code or catching an exception of too generic a type. I wonder if this also satisfies Kernighan & Pike or do they find it muddies the waters even further and still stand by their original statement?

[*] I say manual because what normally seems to happen is that the system monitoring software gets bombarded with errors and someone has to make sense of them. Once that’s done it usually involves bouncing something…

[+] In the embedded world, where lives may be at risk, there are forces way outside the normal risk/reward trade offs of an in-house corporate system.

[#] I’m referring to both the performance and maintenance costs of using exceptions.