Monday, 28 November 2011

Null String Reference vs Empty String Value

One of the first things I found myself in two minds about when making the switch from C++ to C# was what use to represent an empty string - a null reference or an empty string (i.e. String.Empty).

C++ - Value Type Heaven

In C & C++ everything is a value type by default and you tend to avoid dynamically allocated memory as a matter of course, especially for primitive types like an int. This leads to the common technique of using a special value to represent NULL instead:-

int pos = -1;

Because of the way strings in C are represented (a pointer to an array of characters) you can use a NULL pointer here instead, which chalks one up for the null reference choice:-

int pos = -1; const char* name = NULL;

But in C++ you have a proper value type for strings - std::string - so you don’t have to delve into the murky waters of memory management[*]. Internally it still uses dynamic memory management[+] to some degree and so taking the hit twice (i.e. const std::string*) just so you can continue to use a NULL pointer seems criminal when you can just use an empty string instead:-

int pos = -1; std::string name(“”);

I guess that evens things up and makes the use of a special value consistent across all value types once again; just so long as you don’t need to distinguish between an empty string value and “no” string value. But hey, that’s no different to the same problem with -1 being a valid integer value in the problem domain. The c++ standard uses this technique too (e.g. string::npos) so it can’t be all bad…

If you’re less concerned about performance, or special values are a problem, you can happily adopt the use of the Swiss-army knife that is shared_ptr<T>, or the slightly more discoverable boost::optional<T> to handle the null-ness consistently as an attribute of the type:-

boost::shared_ptr<int> pos; boost::optional<std::string> name;

if (name) { std::string value = *name; . . .

C# - Reference Type Heaven

At this point in my transition from C++ to C# I’m still firmly in the empty string camp. My introduction to reference-based garbage-collected languages where the String type is implemented as a reference-type, but has value-type semantics does nothing to make the question any easier to answer. Throw the idea of “boxing values” into the equation (in C# at least) and about the only thing that you can say is that performance is probably not going to matter as much as you’re used to and so you should just let go. But, you’ll want something a little less raw than doing this though for primitive types:-

object pos = null; string name = null;

if (pos != null) { int value = (int)pos; . . .

If you’re using anything other than an archaic[$] version of C# than you have the generic type Nullable<T> at your disposal. This, with its syntactic sugar provided by the C# compiler, provides a very nice way of making all value types support null:-

int? pos; string name;

if (pos.HasValue()) { int value = pos.Value; . . .

So that pretty much suggests we rejoin the “null reference” brigade. Surely that’s case closed, right?

Null References Are Evil

Passing null objects around is a nasty habit. It forces you to write null-checks everywhere and in nearly all cases the de-facto position of not passing null’s can be safely observed with the runtime picking up any unexpected violations. If you’re into ASSERTs and Code Contracts you can annotate your code to fail faster so that the source of the erroneous input is highlighted at the interface boundary rather than as some ugly NullReferenceException later. In those cases where you feel forcing someone to handle a null reference is inappropriate, or even impossible, you you can always adopt the Null Object pattern to keep their code simple.

In a sense an empty string is the embodiment of the Null Object pattern. The very name of the static string method IsNullOrEmpty() suggests that no one is quite sure what they should do and so they’ll just cover all bases instead. One of the first extension methods I wrote was IsEmpty() because I wanted to show my successor that I was nailing my colours to the mast and not “just allowing nulls because I implemented using a method that happens to accept either”.

Sorry, No Answers Here… Move Along

Sadly the only consistency I’ve managed to achieve so far is in and around the Data Access Layer where it seems sensible to map the notion of NULL in the database sense to a null reference. The Nullable<T> generic fits nicely in with that model too.

But in the rest of the codebase I find myself sticking to the empty string pattern. This is probably because most of the APIs I use that return string data also tend to return empty strings rather than null references. Is this just Defensive Programming in action? Do library vendors return empty strings in preference to null references because their clients have a habit of not checking properly? If so, perhaps my indecision is just making it harder for them to come to a definitive answer…

[*] Not that you would anyway when you have the scoped/shared_ptr<T> family of templates to do the heavy lifting and RAII runs through your veins.