Nullable types in C#

One of the “late breaking” features in C# 2.0 is what is known as “Nullable Types”. The details can be found in the C# 2.0 language spec.

Nullable types address the scenario where you want to be able to have a primitive type with a null (or unknown) value. This is common in database scenarios, but is also useful in other situations.

In the past, there were several ways of doing this:

A boxed value type. This is not strongly-typed at compile-time, and involves doing a heap allocation for every type.

A class wrapper for the value type. This is strongly-typed, but still involves a heap allocation, and the you have to write the wrapper.

A struct wrapper that supports the concept of nullability. This is a good solution, but you have to write it yourself.

To make this easier, in VS 2005, we’re introducing a new type named “Nullable”, that looks something like this (it’s actually more complex than this, but I want to keep the example simple):

struct Nullable<T>
{
public bool HasValue;
public T Value;
}

You can use this struct directly, but we’ve also added some shortcut syntax to make the resulting code much cleaner. The first is the introduction of a new syntax for declaring a nullable type. Rather than typing:

Nullable<int> x = new Nullable<int>(125);

I can write:

int? x = 125;

which is much simpler. Similarly, rather than needed to write a null test as:

if (x.HasValue) {…}

you can use a familiar comparison to null:

if (x != null) {…}

Finally, we have support to make writing expressions easier. If I wanted to add two nullable ints together and preserve null values, if I didn’t have language support, I would need to write:

At least I think that’s what I’d have to write – it’s complex enough that I’m not sure this code works. This is ugly enough that it makes using Nullable without compiler support a whole lot of work. With the compiler support, you write:

What would be a whole lot MORE useful [request!!] – non-nullable types.

that is, maybe a keyword notnull or something that checks at the compiler level.

For example:

// generates a compile error

static notnull MyClass x;

// is fine

static notnull MyClass x = new MyClass()

// This class fails to compile, with "X" wasn’t initialized

public class blah

{

MyClass notnull x;

blah() {}

}

// this class compiles fine

public class blah2

{

MyClass notnull x;

blah2() {x=new MyClass();}

}

// generates compile error

private void DoSomething()

{ MyClass notnull x;}

// works fine

private void DoSomethingElse()

{ MyClass notnull x = new MyClass();}

// now imagine we have this function

public void DoSomethingWith( MyClass notnull thevariable )

{Debug.Print( thevariable.ToString();}

// works – because new always returns a notnull type!

DoSomethingWith( new MyClass() )

// doesn’t work

MyClass y = new MyClass();

DoSomethingWith( y );

// works

MyClass notnull z = new MyClass();

DoSomethingWith( z );

// works

MyClass notnull a = new MyClass();

MyClass b = a;

// doesn’t work

MyClass c = new MyClass();

MyClass notnull d = c;

// works

MyClass e = new MyClass();

MyClass notnull f = (MyClass notnull) e;

// or

MyClass notnull f = e as MyClass notnull;

Now, obviously, I could get most of this functionality by defining a template notnull<t>… but then I would only get runtime checking – there is no reason why this shouldn’t be done at compile time. (and then once it was in, I’d be arguing that the default should be notnull, and we should be explicitly specifying those few variables which we actually WANT nullable :))

The main thing is, it should be almost trivial to implement from the compiler standpoint, and would provide a quantum leap in program robustness [well, for us, program readability as our code standards ensure that every single public function, constructor, or property setter check each variable is not null – our code would halve in size! :)]

I hope you and the other language designers know what you are doing for the long run. The current C# language is thing of beauty with very few warts. By adding the Big Four plus this new feature I worried that you are starting to move the language from a simpiler easier to use language more into the realm of C++ hell. I’ve worked with C# for years now (since beta 1) and find the language has been perfect. The language is so explicity about what it is doing, adding new funky operators makes the language more confusing as it isn’t immediately obvious what is being down.

Please do a lot of usability testing with the syntax of this feature. I can see a need but I really think it should be done as a keyword and not an operato. For example,

nullable int a;

nullable int b;

nullable int c = a+ b;

Is immediately obvious what we are doing. The int? syntax is alien and is painful to read/understand. Since this is a feature that isn’t going to be used all the time people are going to have look it up to understand how it works when the encounter it.

How can a bool have three values! It can’t, that’s why you need a new type: nullable bool. In fact I’d keep it as Nullable<bool> but provide operator overloading so that the operators work correctly to return null types.

Are there already any guidelines how to use this with the two primitive types that already inherently provide a similar feature, albeit in a completely different way? I am talking about Double and Single, which support NaN values, which can also be interpreted as ‘value unknown’, but without the ‘nullable’ overhead.

I can imagine that a designer designing a new system that intensively uses floating point numbers, and having the need to represent ‘value is missing’ somehow, now has two implementation options. And it may not be clear whether a ‘nullable’ or ‘NaN’ based approach is best. There will be obvious cases where one or another will be better, but did someone contemplate the effects of choosing one option over another already?

Btw, I did not have time yet to read the spec carefully, if the answer is there, please tell me.

Was there a big demand for this? I see two ways a language designer would come up with a feature – 1) a missing feature people ask for alot 2) overdesign error – "let’s add something just for sake of having it"

Features like this give a BIG gun to the hand of those who like to shoot their own foot.

We’ve heard a lot of positive feedback on the Nullable types. Just so everyone’s aware, this is NOT a C# specific feature. VB 2005 has Nullable types and all other generics as well. See the VB Blog at: http://blogs.msdn.com/vbteam/

OK, after skimming the MS Research paper, I see there is a lot more going on with these special operators (int?/+/*/!) than I realized. I’m not sure I fully grasp the implications or possibilites yet but it does look interesting. Although I will say that for ex-C/C++ programmers, int* has a much different meaning than what is intended with this new feature in C#.

1) The Smalltalk way of reducing complexity, simplify everything and use the language to construct new, powerful features.

2) The opposite way. Add every feature as a language feature and apply lots of rules to avoid all this complexity to be abused.

Is C# moving rapidly down the 2nd path?

Beware, there is a real life counterpart to this: The laws that regulates the social security benefits in almost any country. These laws are so complex that nobody understands them. Every attempt to simplify them has failed, at least in my country. Talk about unmaintainable legacy code!

that’s very cool – where’s Nice for .Net? 🙂 Actually, if you look, the .net team has just added almost all of the functional features to C# too – put together iterators, anonymous methods, generics, you almost have a functional language – which is a _very_ good thing..

Anyway, since writing the request above, I realised you _could_ step a bit of the way with templates. I did it for a few days, and it was a bit annoying in it’s syntax, so I relaxed the rules a bit, and dropped down from compiler time checking to runtime checking, but I love it – I can now say

NonNull<Employer> FindEmployer( NonNull<string> EmployerName )

{

…

}

and be able to propagate those non-nulls through – so I reduce the checking I have to do, and guarantee that everything is checked…

However it’s not as pretty as I would like, and not compiler time checked – a very simple change to the language would allow that.

Here’s the code if anyone wants it:

/// <summary>

/// This class specifies that we are using a non-null version of the type

/// </summary>

/// <typeparam name="TheType">the type we are wrapping</typeparam>

public class NonNull<TheType>

{

/// <summary>

/// the value we are wrapping

/// </summary>

private TheType _value;

/// <summary>

/// Construct this with thsi value

/// </summary>

/// <param name="value">the value to construct with</param>

public NonNull( TheType value )

{

if (value == null)

throw new ArgumentNullException("value", "an attempt was made to construct a non-null value from a null");

_value = value;

}

/// <summary>

/// Convert this to a string – we just pass this off to the value we are wrapping,

/// who we know is not null 🙂

/// </summary>

/// <returns>the value to string</returns>

public override string ToString()

{

return _value.ToString();

}

/// <summary>

/// this is a bit of a kludge, but allows us to call methods on the base type.. ideally the

/// compiler would support this directly!!!

/// eg myVar.Call.MyFunction

/// </summary>

/// <value></value>

public TheType Call

{

get { return _value; }

}

/// <summary>

/// we make sure that our non-null version of the type can always be used just as the type

/// itself would be, by making an implicit casting operator

/// </summary>

/// <param name="theValue">the non-null value</param>

/// <remarks>it is important to remember that we have SPECIFICALLY NOT created an implicit

/// operator the other way around – it would be easy to do, and would still give us runtime

/// checking, but we always want to make sure the developer is thinking, when they go from

/// nullable to non-null… so DON’T ADD IT!!</remarks>

/// <returns>the original value</returns>

public static implicit operator TheType(NonNull<TheType> theValue)

{

return theValue._value;

}

/// <summary>

/// This shouldn’t exist, but the code was a nightmare otherwise – guess I’ll have to live

I agree, Darren, I’d love to see something like string!(I like this type of syntax, in the long run its easier to deal with than 100 keywords or generic types on a line,Dictionary<Guid,Nullable<int>> is as bad as it should get, IMHO) which formed a non-nullable reference type. However, I think it would require a runtime change(perhaps a NotNull<T> type like you have above which the verifier would balk at an assignment if null could be on the stack), otherwise you could never be confident in passing a non-nullable type to ref or out parameter or accepting one as a return value(what good is it if the code checks at run time, we want compile time checks, right?)

While I can only approve the introduction of generics and iterators in C#, two things that are necessary and useful, I must say that I have serious doubts about anonymous methods and nullable types.

The one thing I liked about C# as it emerged, was its relative simplicity. Many keywords (too many, some say), and yet a few relatively simple concepts. I’ve always placed C# between C++ and Java, with the full arsenal of C++, but without the monstruos constructs backwards compatibility imposed on C++ and the clumsy interpretation of "everything is an object" in Java.

This doesn’t mean that I don’t approve the introduction of these features. However, with the multi-language support the .NET platform has, I would have expected new languages or new class libraries instead of new features to existing languages. Off topic, I think the C++ tweaks Microsoft is working on are already pushing the limits too far (in the wrong direction). Not to mention VB, but that’s a different story. Anyway, regarding C#, "if it works, don’t fix it", right?

I can only hope the design team has considered all implications and that the new features will follow the principle "pay as you go" – developers not using the new features should not be subjected to performance or other kinds of penalties. I guess that won’t be true for the runtime, anyway.

Of course, perhaps I just don’t share your view on these things, so my opinion might be irrelevant after all 🙂 I’m only curious how many of the developers using C# out there actually requested or approved this particular feature.

PS:

int? x = 125;

int? y = 33;

int? z = x + y;

doesn’t work in the March preview release of Whidbey. "Operator ‘+’ cannot be applied to operands of type ‘System.Nullable<int>’ and ‘System.Nullable<int>’". Will it work in the release version? If not, then the ‘int?’ stuff might be just useless syntactic sugar over a framework construct 🙁

Isaac: I wasn’t saying C# is now a fully functional programming language or anything – but I figure if I have list operations [iterators] lambda functions [anonymous methods] pattern matching [generics] then I can program functionally, and that’s good enuff for me!

– I made a generic map function the other day, and it was basically just "for each item in blah yield return f(item)" – we aren’t that far away from the functional world anymore… we just have all that other junk littered around as well 🙂

I’m not an "any-paradigm" purist, but I now have enuff for me to shift more of my thinking into functional, less into OO, and to me that’s a good thing!

I must say that C# is a nice language but it misses one of the key things that makes for widespread adoption of it by developers… cross platform compatibility.

The reason why Java became so popular was that you write code once and then use it anywhere. The whole idea behind OOP is that you plan on reusing code. And what good is it if I can only reuse that code on one platform?

Sure. It’s great for desktop apps. Fantastic. You have hit your target demographic. But to get web developers to use it when Apache is the most popular web server and to get system administrators or application developers for portable devices to use it is not very likely.

Alot of people point ot MONO (which is still in beta) and say that makes C# cross platform… but no, it doesn’t really. The project is getting no support from Microsoft who can at anytime change their spec and make all that code useless.

So why, as a developer, would I ever want to use an OOP language that isn’t cross platform?

The only reason I make this whole point is that I spent the last few months making a long decision of what to use… Java or C# for my next project. I researched them both and they both have a similar level of functionality Java is still a bit faster but Microsoft makes up for this by integrating the virtual machine into their OS.

And what it finally came down to was that I plan to be doing web development and portable applications. And when you consider that, Java is the natural choice. My code can go anywhere .

If I was building a desktop application, I’d probably use C instead of C#; I mean after all, if you are going to be building code that works on one platform, stick with the tried and true.

Eric Gunnerson blogged last week about Nullable types in C# 2.0. This new C# feature will allow one to specify a Type like int? which will act just like a regular int, with the exception that it can now also be null. The "?" syntax comes from this Microsoft Research paper, which introduces the following Regex-inspired Type-modifiers: Type*Zero or more…

One confusion I’m seeing is that cardinality is being confused with reference versus value (i.e. heap allocated versus stack allocated). They aren’t actually the same thing, but by default a cardinality of 0 or 1 is associated with reference variables and a cardinality of exactly 1 is associated with value types. They don’t have to be and in my opinion they shouldn’t be.

IMO, by default all objects should have cardinality 0 or 1 (!). So string should be the same as string! and int should be the same as int!.

As a programmer, I shouldn’t really care how variables are stored (i.e. stack versus heap). That’s really the compiler’s problem. But I always care about the structure of my data (i.e. its types and those variable’s cardinalities).

Dunno about anyone else, but ‘int?’ certainly doesn’t read as well as ‘nullable int’. I can see where they are coming from, but int* already has another well recognized meaning, so using that family of notation may be a bit ambiguous. Then again, just an opinion. Are there any statistics, or polls out there which present the options? It might be a pain to type the ‘nullable’ keyword, but then again, the ‘return’ keyword isn’t intellisensed either, and that doesn’t hurt anyone.

I kinda think this is just syntactical sugar. I don’t think this solves some really pressing problem, and only appeals to the laungauge-junkies who like to express things in terse syntax. The syntax can hide far more complexity and make it difficult down the line.

To the down in the trenches programmer, this kind of syntax will become very confusing. It is simply not needed. If you want a multivariate return type, then define somthing simple, but something that expresses the nature of what your doing…don’t make syntax look too similar to existing sytax. And I personally don’t want to have to worry about whether my ints are ‘0’ OR ‘null’.

One of the features I value most about C# is simplicity. While it is currently a little more complex than I think it should be, adding this type of complexity on something as simple as a null seems a step backward. It will cost more in debugging and code research due to incompatible types along with cluttering up your code with symbols that have meaning on their own beyond this use.

Do not understand why other value types cannot address null as simply as "string" does today. While, I do have a problem with a null string and string.empty(), it would be nice to see int == null and would keep things simple.

Although I see the need for generics in some form, it appears more everyday that C# is stepping backward to become another C++ with garbage collection. Code will become more cryptic and debugging time will increase.

For me it is simplicity, which equates to less development time, easier to maintain and less debugging. You increase any of those, C# suffers along with .NET.

Nathan: <tedious off-topic rubbish about how C# is unsuitable for anything because it isnt ‘cross-platform’>

I guess you just wanted to regurgitate some turgid off-topic received ‘wisdom’ and this was the blog article that you happened to be browsing when the muse took you. Anyway, thanks for that inconoclastic bombshell of a comment.

Rocky, the reason the value types can’t do this as easily as string is because string is a reference type. So null is a natural, valid state for a reference type. I like the addition of the "simplified" generics which are quite different from C++ templates. The one thing I am not sure I like is the function syntax changes like:

int? i = null;

int j = i ?? 5;

Actually I am getting used to "??" but I still don’t like:

int?

Is this an int that can’t make up its mind as to whether or not it’s actually an int? I just don’t like the whole application of regular expressions syntax in a programming language where the vast majority of C# programmers take this:

I’m really looking forward to this. I don’t need nullable types every day, but when I do, I find it pretty annoying to have to work around it. At least in Java there were corresponding wrapper types, so you could do something like:

public void Register(String email, Boolean optIn)

where optIn would be null if the user did not answer the question. I like the idea of this in C#:

public void Register(String email, bool? optIn)

Seems like the best of all worlds to me. If you are afraid of null values, don’t use the nullable type. But I say, better to have clearly nullable values than "magic" values (like Stream.ReadByte() returning -1 when the stream is done) which are too easily ignored.

With the current beta, a boxed null int? is is non-null when boxed (for obvious reasons).

I think that when a nullable object is boxed, the compiler needs to generate a check to see if the nullable object is null and if so, it needs to not box and just pop & ldnull. The extra check shoulnd’t be a performance issue (since boxing is expensive anyway).

Currently, as it stands, a nullable int suddently becomes non-null when it is boxed which I don’t think is the normal expectation.

Generics are not the same as pattern matching. On the other hand, I think that the commonly accepted definition of a "functional programming language" is a language that has higher order functions (which C# has with delegates).

However, common functional programming languages have more stuff like anonymous closures (C#’s anonymous delegates) and pattern maching (C# does NOT have an equivalent). OCaml and Haskell also have type inference (C# really could have easily done type inference for local variables…what a shame). Interestingly, C# generator functions are a form of list comprehension, something that I used to consider one of the more exotic properties of functional languages.

Ovidiu:

C# does have improvements over Java, but nothing particularly novel. My favorite thing about C# is the way method overriding is handled (and of course, anonymous closures).

Though value types seem nice at first, they cause nothing but problems. The biggest one is that there’s no difference in syntax between pass-by-reference and pass-by-value. This wouldn’t be a problem if value types were immutable because then it wouldn’t really make a difference (which is why you don’t get bit by this in Java). The worst part is that the performance benefit of value types can be achieved automatically by the JIT compiler and so all you get in the end is confusing semantics.

Also, the ‘out’ and ‘ref’ parameter passing mechanism uses aliasing semantics instead of copy-in copy-out semantics (which, 99% of the time, is what most people need). This means that you can’t pass properties as parameters (shattering the illusion that properties are regular fields).

About nullable types…sure it’s a nice features but it doesn’t fit in with the way the type system is currently handled.

As people have already pointed out, the Nice programming language has builtin syntax for "option types". This cleans things up by never allowing a reference to be null. If you want a null reference, then you have to make it an option type (which is syntatic sugar for "Option" in MLk, "Maybe" in Haskell and Nullable<T> in C#).

The advantage with having compiler support for this stuff is that you don’t need ArgumentNullException. The compiler will do this for you. And what’s more, it’ll be done at compile time. You don’t have to write all the pain-in-the-ass argument checking and you don’t have to document it in the function parameter documentatin using the excessively verbose and unreadable XML format (which braindead XML-zombie made that decision, anyway?).

I’d just like to add a +1 on the ‘nullable int i’ syntax and a -1 on the ‘int? i’ syntax. Keywords are easier to understand, and not much harder to write. I think the ugly < and > introduced with Generics is enough special syntax to introduce to the language.

The simpler, the better. If C# continues to add new syntactical tokens to the language, it will be more difficult to understand, it will be more difficult to read, and just simply look uglier. We want C# to be beautiful, don’t we? At least, I do.

Nullable Types in C# One of the &quot;late breaking&quot; features in C# 2.0 is what is known as &quot;Nullable Types&quot;. The details can be found in the C# 2.0 language spec . Nullable types address the scenario where you want to be able to have a