For long lost friends and stalkers

Nullable reference types in C#, a design

In my previous post, I outlined a list of requirements for non-nullable (and explicitly-nullable) reference types in C#. In this post we’ll dive into some further design decisions. Subsequent posts will look at the impact on generic types, plus backward-compatibility and some corner cases.

Expanding the type system

We add two new main concepts to the type system:

non-null reference types T, denoted as ‘T!’, and

explicitly-nullable reference types, ‘T?’.

For example, strings:

string! notNullable;
string? explicitlyNullable;

These we’ll call ‘null-aware’ types. The traditional, legacy, old-fashioned reeferences we’re all used to, which can implicitly be null, we’ll call ‘implicitly-null’ types. (Occasionally I’ll denote them here as ‘T|null’ just to be unambiguous, for example, string|null to denote the implicitly-nullable string type).

The main rule is that a slot of reference type T!cannot have the value ‘null’. (Whereas reference type T|nullcan of course have the value ‘null’ in addition to any value of T!.)

As you can see in the example of ‘c’ above, you can pass an implicitly-null type (T|null) to a function (or property) expecting a T!. (This rule is important for backward-compatibility.) However, if the caller tries to pass a null value at runtime, the environment will automatically throw an ArgumentNullException on behalf of the method, so that the null is never actually assigned to the parameter.

This allows legacy APIs to have their parameters changed from implicitly-null (|null) to non-null (!) without breaking source compatibility of old code. (In addition, the compiler could do some basic flow analysis to check that no obvious nulls will be passed to the function, and raise warnings as appropriate.)

Apart from this rule for method calls, T|null values cannot be assigned to T! variables. (The other way around is fine though):

Implicitly-typed local variables

Implicitly-typed variables become slightly tricker, for reasons of backwards compatibility with older versions of C#:

var d = "Bob"; // Is this inferred as a ‘string|null’ or ‘string!’ ?

The inferred type of ‘d’ above should generally be (non-nullable) string!. However, this could break legacy code if the code expects it to be inferred as an implicitly-nullable string and subsequently assigns a null to it (d = null;). Therefore the type inference looks at all the places which assign to ‘d’. If any of them could assign a null, or an implicitly-null value, ‘d’ is typed as (implicitly-null) ‘string|null’, otherwise it’s inferred as string!.

This type inference rule may occasionally be too complex for the compiler, in which case it will infer as ‘implicitly null’ by default. If this happens, you can help the compiler by declaring it explicitly non-null like this:

var d = "Bob"!;

(That’s not doing anything very special; just using the ! operator to ensure that an unambiguously not-null result type is inferred for the string expression.)

In a later article we’ll suggest a way to tell the compiler that the code is null-aware, and that such implicitly-typed literals should be inferred as not-nullable.

default(T)

The C# ‘default’ operator takes a type name and returns the default value of that type.

What is the default value of a non-nullable reference type?

For example, what value should ‘default(Stream!)’ return? By definition it cannot be ‘null’. And the language can’t just return new Stream(), (not least because Stream is an abstract type). Without null, reference types simply do not have sensible default values.

We disallow default() for non-nullable reference types; it is a compile-time error to write default(Stream!).

(This creates interesting issues for generic code, but I’ll cover that in a future article.)

A related issue is:

Missing values

Unfortunately, there are some situations in which non-nullable reference type variables cannot have a value.

Consider the case when you allocate an array with: “streams = new Stream![11];”. That’s an array with 11 elements, each of which is a non-nullable Stream object. What should the .NET runtime do when you ask for stream[2] (before you’ve assigned a value to it)? It can’t be null, because we’ve excluded nulls from the array; it can’t be default(Stream!) because default is not defined for non-nullable reference types. We can’t (in general) force the programmer to assign it before reading it.

There are a few situations where something similar can happen:

A non-nullable reference instance variable in a class, where that variable has not (yet) been assigned a value by the constructor.

A special case of 1. is where the this reference has ‘escaped’ from the constructor before the constructor has finished executing. (This is bad practice, but it’s quite possible to do accidentally or on purpose.)

A non-nullable reference instance variable in a struct: it’s perfectly possible to instantiate a struct without its constructor ever being called. (In fact, it’s really easy.)

Arrays of non-nullable reference types (as per the example): initially all of the array elements will be unassigned.

The solution is to accept that some values in a live program may be ‘unassigned’. The language cannot realistically prevent programs from dereferencing ‘unassigned’ fields. The best we can do is to throw a ‘FieldUnassignedException’ if the program attempts to read from an unassigned field. In fact, not only is that the ‘best’ we can do; it’s the only thing we can do without significantly changing the language.

How is this actually better than allowing uncontrolled nulls, and ending up with NullReferenceExceptions in the first place?

We seem to have just substituted the old familiar NullReferenceExceptions with newfangled FieldUnassignedExceptions— but there is an important difference: unexpected null values can propagate far through a program before they eventually trigger a NullReferenceException. The error (if it occurs) is far removed from the fault in the code which accessed the unassigned value. A FieldUnassignedException happens as soon as you try to read the unassigned variable, therefore the program fails faster.

Further, it is easy to prove which bits of code could be prone to FieldUnassignedExceptions. Programmers can eliminate them totally by ensuring that:

Constructors assign all nullable-reference fields before returning and before calling any method which reads these fields, and before calling any virtual methods, and before passing ‘this’ as a parameter to another method, and before making ‘this’ visible to any other thread. (Phew!) These things are good practice anyway.

Structs containing reference types should either make them (explicitly) nullable references, or include a flag to say whether the constructor was called.

Array code should be carefully audited, and most code should avoid using arrays at all (which is good practice: there are better collection types available)— or at least avoid arrays of non-null reference types.

All of these cases can easily be identified and violations flagged as warnings by static analysis tools.

(It is very much easier to avoid a FieldUnassignedException than a NullReferenceException!)

Summary

We’ve covered an approach to introducing non-nullable and explicitly-nullable reference types to C# and .NET, including how to declare them, some of the new rules of assigning values and calling methods, and the consequences of uninitialised values in the face of non-nullable references.

7 thoughts on “Nullable reference types in C#, a design”

Spec# has non-null and explicitly-null reference types (and a bunch of unrelated stuff too). However Spec# is fundamentally a research system which doesn’t need to be compatible with the large amount of existing C# code. It also puts some heavy constraints on the programmer to guarantee that non-nullable references always have an assigned value when they’re read.

My proposal is designed to work with existing C# code and provide the advantages of programmer control over nullability, without entirely changing the language, breaking any existing code or putting an undue burden on the programmer.

(Note too that there are other proposals for adding explicit nullability and/or non-nullable reference types to C#, not just Spec#. Many of them start with ! to mean ‘non-null’ and ? to mean ‘explicitly nullable’, but then quickly diverge from Spec# and from my proposal.)