Post navigation

Nullable micro-optimizations, part seven

We’ve been talking about how the Roslyn C# compiler aggressively optimizes nested lifted unary operators and conversions by using a clever technique. The compiler realizes the inner operation as a conditional expression with a non-null nullable value on the consequence branch and a null nullable value on the alternative branch, distributes the outer operation to each branch, and then optimizes the branches independently. That then gives a conditional expression that can itself be the target of further optimizations if the nesting is deeper.

This works great for lifted conversions and unary operators. Does it also work for binary operators? It seems like it would be a lot harder to make this optimization work for a lifted binary operator where both operands are themselves lifted operations. But what if just one of the operands was a lifted operation, and the other operand was guaranteed to be non-null? There might be an opportunity to optimize such an expression. Let’s try it. Suppose X() and Y() are expressions of type int? and that Z() is an expression of type int:

int? r = X() * Y() + Z();

We know from our previous episodes that operator overload resolution is going to choose lifted multiplication for the inner subexpression, and lifted addition for the outer subexpression. We know that the right operand of the lifted addition will be treated as though it was new int?(Z()), but we can optimize away the unnecessary conversion to int?. So the question is can the C# compiler legally code-generate that as though the user had written:

If you think the answer is “yes” then the follow-up question is: can the C# compiler legally make such an optimization for all nullable value types that have lifted addition and multiplication operators?

If you think the answer is “no” then the follow-up questions are: why not? and is there any scenario where this sort of optimization is valid?

13 thoughts on “Nullable micro-optimizations, part seven”

The optimization is legal only if the multiplication operator has no side effects and doesn’t throw exceptions.

So in the case of integers, it’s only valid in an unchecked context. In a checked context, the multiplication might throw an OverflowException; so the compiler mustn’t generate code that calls Z() before the exception is thrown.

That should have been “is legal if”, not “only if”. There are other scenarios where this optimization can be valid; e.g. if the compiler can show that Z() has no side effects; doesn’t throw exceptions and doesn’t depend on state changed by the multiplication operator. (easiest case: Z() is a compile-time constant)

You are nitpicking the details of the optimization implementation. The core implementation idea is still valid. If you don’t use a local:
new int?(tempX.GetValueOrDefault() * tempY.GetValueOrDefault() + Z())
it looks to me that this is a valid transform, the optimization we wanted to apply is there, and I use one less temporary than you do.

You can also emit a if() rather than using the ternary ?: for more flexibility at which point you evaluate your locals.

Of course, this goes to show that _any_ transform, as simple as it looks, may be wrong for very subtle reasons. I sometimes wonder how C++ compilers can perform any optimisations at all.

You could fix that with int left = x.Value * y.Value but as we learned that in part 1 of this series GetValueOrDefault() is faster and its also legal since you’ve already checked that x and y have values with if (x.HasValue && y.HasValue)

> I sometimes wonder how C++ compilers can perform any optimisations at all.

There really is no difference between C# and C++ compilers here. As long as your program doesn’t do weird stuff, the compiler has exactly the same knowledge in the two languages: function calls might do pretty much anything to the global state and their by-ref arguments. Everything else is pretty much known.
The only difference is that C++ has larger memory-safety holes and you can do more weird stuff. But that generally leads to undefined behavior, which means the compiler can do whatever it wants *anyway*, and generally will just assume this doesn’t happen.

Instead of calculating Z() and assigning it to a temp, it should be possible to move the calculation of Z() into each branch (you’d just ignore the return value before returning null in the alternate branch).

Sure. But that is then duplicating the code. What if it was more complex than just “Z()”? The point of the optimization is to make the code smaller and simpler; duplicating code usually works against that. As I’ll discuss next week, Roslyn uses a very simple heuristic: the expression is only optimized if the right hand side is a constant. We then know that it doesn’t need to be replicated on the alternative branch!

I was thinking about generalizing the construction I showed above to more operators, which turned out to be quite easy (at least if you use goto|s instead of if|s), when that very argument came to my mind.

As often it is a case of memory vs cpu trade-off. My solution above would indeed duplicate each operand expression once, except for the first two. On the other hand, I test only one condition per operand and I create a single int? for the final result. So what do you optimize for? Given that memory is cheap and those expressions are unlikely to be really big anyway, I’d say go for the cpu.

Maybe if you really want to avoid degenerate cases use a heuristic that disable the optimization based on expression size?

BTW, here’s how I see the optimization for more than 2 operators (e.g. x * y + z / w):

I took the liberty to use return instead of result assignment, that doesn’t change the flow. Also I’m reusing the same 2 locals again and again, obviously if there are some type conversions and everything is not int? you would need to use some more. Since their lifetime doesn’t overlap it’s likely that the share the same stack space after codegen anyway.