Post navigation

Inferring from “is”, part one

Specifically, why the cast was illegal if the variable tested was of generic parameter type. Today I want to take a bit of a different tack and examine the question “why do we need to insert a cast at all?” After all, the compiler should know that within the consequence of the if, the variable animal is guaranteed to be of type Dog. (And is guaranteed to be non-null!) Shouldn’t the code simply be:

if (animal is Dog)
animal.Bark();

This issue has been posed to the C# design committee a number of times over the years. I thought today I might describe how I’d push back on the proposal, and counter with some proposals that have a better chance of being actually implemented.

Throughout I’ll assume that animal is of type Animal and that there are the obvious relationships between this type and its derived types.

The first thing I would note is that the is operator operates on an expression that has a value, not only a variable. Automatically we would wish to restrict the feature to variables:

if (foo.Bar(DateTime.Now) is Dog)
foo.Bar(DateTime.Now).Bark();

The supposition that foo.Bar(DateTime.Now) will be Dog both times seems unwarranted; the compiler has no reason to believe that two calls with potentially two different arguments will return an object of the same type consistently. The value of those two expressions can be different.

Fortunately, the value of a variable never changes, right? Oh, wait, variables are called variables because they vary:

if (this.animal is Dog)
{
this.M();
this.animal.Bark();
}

If animal is a field then M() might change the value of the field. But M() could be any method, including… Bark()!

if (this.animal is Dog)
{
this.animal.Bark();
this.animal.Bark();
}

How do we know that the first call does not change the type of animal rendering the second an error? Remember, if the method is virtual then we may not even have the code that will ultimately be called when the program is compiled, so analyzing it is a non-starter. This problem alone would be enough for me to reject the feature completely, but let’s solider on.

What if there is no intervening call? Unfortunately, other things can also give an opportunity to change a field:

if (this.animal is Dog)
{
yield return 123;
this.animal.Bark();
}

The method returns at the yield and then continues when MoveNext() is called again, but by that time the caller may have done something to modify the field. Similarly:

The await returns immediately, giving ample time for someone else to change the value of animal.

And of course none of this considers the problem of the field being modified on another thread, but if you have two threads sharing memory, one reading and one writing, and no locks, of course you already have a bug, so I’m not super concerned about that problem.

Also of course this same set of problems applies to elements of arrays as much as to fields.

So what to do here? Remember, the point of a type system is to detect potential problems and determine at compile time that they will not crop up at runtime. If the type system ever allows a giraffe to bark then we have a failure of the type system. We can’t just ignore the problem. Restricting what can come between the condition and the usage of the variable seems difficult. And just imagine the error messages the compiler team would have to come up with to explain why animal.Bark() is legal, but animal.Bark(); animal.Bark(); is illegal.

It seems like restricting the operand to be a variable is not enough. What if we restricted it to a local variable or formal parameter? Of course those can be modified:

if (animal is Dog)
{
animal = new Goldfish();
animal.Bark();
}

So the proposed feature would necessitate a flow analysis to discover if the variable is written before it is used, but the compiler already has such an analyzer, for definite assignment.

Unfortunately, if the variable is a closed-over outer variable of a lambda then we have some of the same problems as before:

Today, without the feature, this calls B.M. That is almost certainly wrong; it seems highly likely that the author of the code expected D.M to be called. But nevertheless, introducing the feature would cause a different method to be called tomorrow than would have been called in yesterday’s code, and that’s a subtle breaking change.

Again, the original code is probably a bug, and N(Animal) probably does the right thing regardless, but if we go from calling N(Animal) yesterday to N(Dog) tomorrow, that could be characterized as a breaking change.

Similarly for scenarios that I won’t spell out in detail, where working code suddenly becomes not-compiling code because the additional type information introduces an ambiguity that previously did not exist for overload resolution.

So that’s lots of points against this proposed solution of inferring the type to be stronger within the body of the if.

Like this:

Related

25 thoughts on “Inferring from “is”, part one”

Now there are ways to make that feature work. Ceylon for example shows that, though their approach probably doesn’t fit C# very well.
It boils down to allowing this flow typing only for immutable variables which solves all these ‘that field/array entry/local could change’ problems. That works, as they, like many more recent languages, have a focus on having most variables immutable.
For mutable variables, they have something similar, but that requires you to create a new, immutable local variable that is assigned the result of the cast. It’s really just a shortcut to check-then-cast.
Also, they don’t have that overload-resolution problem because they don’t have overloads and as a language that has had that feature from the start, they don’t have to care about backwards compatibility.

I’d be in favour of some kind of syntax that declares a variable that exists in scope of the `if` block. For example:

if (var dog when animal is Dog)
{
dog.Bark();
}
dog.Bark(); //Compiler error: The name “dog” does not exist in the current context

The “if” expression syntax design is bad/wrong, but hopefully it carries its meaning. The benefit of this is that it does the single check and keeps the “dog” scoped within the if. Traditional “as” checks do the single check, but require the “!= null” check and have the typed variable polluting the outer scope.

Theoretically, you can achieve a similar effect now with “for” loop abuse:

Because “someCheck()” could return true and short-circuit the if expression, dog is never assigned. Considering some of the complexity one can throw into if expressions, I’m not sure if this is the best either. (Maybe “dog” could be “null” by default in the case of short-circuiting? But then you’re still left with a null check within the “if” block.)

The type check and variable binding are done in a single step so none of the problem mentioned in this article can occur. But having said that I don’t think I like type casting that much. Depending on what you want to do a visitor pattern might be better…

My strawman attempts at syntax for this feature have tended to be something like:

Animal animal;
if is (Dog dog from animal) {
dog.Bark();
}

Equivalent to:
Dog dog = dog as animal;
if (dog != null) dog.Bark();
(except tweaked to also work on struct types where “as” is illegal).

It’d be tempting to also allow “if is (Dog animal)” if animal is a local variable or type parameter, but that has all the gotchas mentioned above – essentially it’d redeclare animal as a separate Dog-typed local within the block, and if the original animal variable is captured by any lambdas, that’s going to be weird.

I’ve seen pushback on this proposal because of the possibility that multiple cases could match. That’s true, but I don’t think it’s a problem. In a sense, normal switch statements have the possibility that multiple cases could match, because of the “default” statement that matches everything. It’s understood that “default” only matches if none of the prior cases match[1]. Specifying that the cases in a “switch typeof” statement are evaluated in order and the first matching statement is applied would be perfectly reasonable, in my opinion.

[1] After checking the C# language specification, turns out it is permissible for default not to be last in a switch statement, but I’ve never seen it done in real code, and I’ve been writing C# and interested in the intricacies of the language spec ever since C# 1.0. I’d actually consider it a minor specification bug that this is even permitted – there’s no reason why code with default anywhere but last would be better or clearer, and because it’s virtually never used, many programmers develop a mental model of switch as if it were just a more terse way of writing an “if/else if” ladder, and would be confused about what a “default” in the middle would actually do.

There are some situations in which it makes sense for the default to not go last but I will maybe talk about those in a future episode rather than cram it into this comment.

The idea of switching on types is commonly raised; the problem of there being multiple matches is a tricky one. Suppose there are interfaces in there? It could be difficult to find a best match. And people have an expectation that switch statement sections can be re-ordered, so relying on the order seems bad. And besides, one of the main reasons to use a switch is to say “I don’t want to go through this long process of checking a bunch of individual conditions; the value is one of a discrete set, so go directly to the case which handles it.” If a switch is just an “if” behind the scenes it feels wrong to me.

I do know that switch statements are implemented with an optimization to jump straight to the correct branch, but it’s vanishingly rare for me to be writing code where it’s conceivable that the performance difference between a sequence of ifs and a switch would make any difference. I’m not actually aware of any time that I’ve ever chosen between switch and if/else for performance reasons. I know that there are times where it matters, and I can guess that those times pop up more often in the kind of code you write (compilers and suchlike with tight performance-critical loops) far more often than in the kind of code I write (database/web applications), so the difference in our outlook on it makes sense.

My usual reason for deciding between switch and if/else if is nothing to do with performance, and just to do with whether the operation I’m doing “feels like” the semantics associated with switch, which is to say, there are more than two or three branches, and which one I want to take depends on equality-like comparisons on a single value evaluated upfront.

The advantage of switch in that case (no pun intended) is that the person reading the code can tell in advance that all the conditions have the same “shape” and there’s not some other weird condition halfway down the ladder. The syntax used conveys extra meaning about what the intention behind the code is, even when the actual behavior is effectively identical.

Of course, the fact that I think of switch statements this way doesn’t negate the fact that you see them a different way that’s (at least) equally valid – and that’s enough to counter my type-switch proposal.

One of the nice things in TypeScript is that it understands constructs like `typeof ==` and `instanceof` and is able to automatically treat variables as that type within if blocks. Particularly handy for Union types.

One feature I miss in C# is pattern matching, or, at least, a way to perform multiple tests like we do in catch blocks with exceptions, imagine a better syntax for this:
try{throw animal;}
catch(Dog dog)
{
dog.Bark();
}
catch{}

What would typeof(animal) be inside the body of the if? Is it Animal or Dog? If Dog, then you are in the weird case where you wrote “Animal animal;” yet typeof(animal) is not Animal! Instead, the type of a variable changes during the lifetime of a function. On the other hand, if typeof(animal) remains Animal throughout, then the meaning of E.M depends not only on the type of E but also whether it’s inside an “if” statement. And does this propagate into inferred types?

Hey Raymond thanks for the comment. Fortunately C# does not allow typeof(variable), only typeof(Type), but your point is well taken; there are many rules of the language that rely on the property that an expression has a specific compile-time type.

Consider the case of “if thing is IDisposable then dispose it”. A fair number of performance-critical structures implement IDisposable with a do-nothing method (e.g. List[T].Enumerator) If such a structure is received as a generic type, casting to IDispossable in order to call the do-nothing dispose method will negate the benefits of having used a structure in the first place.