Eric Lippert's blog

Main menu

Post navigation

Monads, part twelve

Holy goodness, this series got a lot longer than I expected. I was hoping to get to some actual category theory, but let’s just talk a bit about LINQ and then wrap up. Maybe I’ll do a series on the basics of category theory in the future.

I’ve said several times that SelectMany is the bind operation on the sequence monad, but of course there are several overloads of the SelectMany extension method of IEnumerable<T>. The one that is bind is as we previously discussed: (Error checking removed for clarity.)

Why not? Because we want outer to still be in scope when the final select happens!

from outer in items
from inner in function(outer)
select new { outer, inner } // outer had better still be in scope!

How do we get two parameters in scope at the same time? This is a hint that there is some sort of nested operation going on here. Suppose we have our unit function as an extension method: (Yes, it is super bad style to make an extension method on everything, but it is pedagogically and syntactically useful so I’m going to do so several in this episode.)

static IEnumerable<T> Singleton<T>(this T t)
{
yield return t;
}

The query above then is actually two SelectManys, one nested inside the other!

(It is at least NP-HARD because it is equivalent to finding a solution of a 3-SAT problem; I suspect that overload resolution in C# is not actually harder than that. I see no reason, for instance, why it ought to be undecidable. But I don’t have a proof of this assertion. UPDATE: Overload resolution depends on convertibility, and convertibility in languages with contravariance and nominal generic subtyping has been shown to be undecidable! But I still do not know if overload resolution in C# is undecidable even if convertibility is restricted to a decidable subset of the convertibility rules. I still suspect it is not.)

That’s all very highfalutin; what we discovered in practice was that the compiler and IDE essentially hung indefinitely and chewed up all the address space when the nestings got about eight or nine deep, which is not at all unreasonable. Clearly something had to be done. What we did was split the problem into two parts. First, the simple case; if all you have in your query is:

from inner in items
from outer in function(items)
select projection(inner, outer)

The complicated case is the scenario where there’s something other than a select immediately following the two from clauses. In that case, in order to avoid deeply nested lambdas the C# compiler does an even more complicated transformation involving “transparent identifiers” and anonymous types. Getting into all the details of that would take us far afield; I want to finish up this series. We’ll talk about transparent identifiers some other time.

So let’s drag this back to general monads. Suppose we have a monad M<T> and extension methods:

And sure enough, the query comprehension binds the AsSmall function to the maybe monad. We get the (short) 1 out of the first call, and nulls out of the second and third.

Whew. I’m glad that worked after six weeks of buildup.

So what’s the upshot of this whole thing?

First: the monad pattern is an amazingly general pattern for “amplifying” types while maintaining the ability to apply functions to instances of the underlying types.

Second: because of that ability, the “bind” operation is the Swiss Army Knife of higher-order programming. The C# and VB teams could have built LINQ out of nothing other than functions that combine the unit and bind operations; the result would not have been as efficient as the optimized special-purpose helper functions we actually did build, but it can be done.

Third: LINQ in C# and VB were designed to be very comfortable to use on only one monad, the sequence monad. But as we’ve seen, the LINQ “select many” syntax is actually just a roundabout way to bind a new monadic workflow onto an existing one.

InvalidOperationException would probably be appropriate in this case, but that wouldn’t always be true. In general, I think it is better to throw System.Exception than to hastily throw something that might not accurately reflect the nature of the exception. Obviously, you’d need to replace it with something more specific before it goes to production, especially if it’s going into a reusable library, but in throwaway code, why not?

The problem with throwing System.Exception is that it makes it much harder for calling code to respond appropriately. There’s at least one code path in the BCL which, when passed invalid input, catches a FormatException and wraps it in a System.Exception, making it extremely painful to call. I reported it as a bug in .NET 1, but was told it couldn’t be fixed because it might break backwards compatibility.

Throwing a System.Exception with the intention of going back to change it to a more appropriate exception later is a bad idea, because nine times out of 10 you’ll never go back and change it. You’ll either forget to do it, miss it, or run out of time.

Throwing a System.Exception in a throwaway code sample is bad, because it encourages other people to do the same thing in their production code. After all, if Eric does it, it must be the right thing to do!

Same goes with sample code not disposing Disposables or leaking event handlers (especially when using lambda construct += (sender, e) => …). For what it’s worth, I see some overhead if each code snippet someone puts on his blog should follow all good coding practices…

Regarding footnote 4: I learned in University (and the Wikipedia article on NP hardness says the same) that undecidability is included in the class of NP hard problems. Or am I confusing decision problems (Wikipedia cites an undecidable *decision problem* as NP hard) with optimisation problems?

If a problem is NP-HARD that means it is AT LEAST as hard as the HARDEST problem in NP. Since the halting problem is harder than any NP problem, by definition it is NP-HARD.

So it was silly of me to say that the problem is “at least NP-HARD”, since the definition of NP-HARD already includes an “at least”. I should have been more clear; what I meant to say is that the problem is definitely NP-HARD, and I believe it to be NP but I don’t have a proof of that.