Saturday, January 19, 2008

Extending the work in my last post, I've developed a way to express an almost type-safe, general monad in C#. Similar to my module translation, the single monad object becomes a pair of co-operating objects, only one of which the monad implementor must define. Since C# cannot abstract over type constructors, I had to exploit the only feature that could accomodate the flexibility I needed: C#'s dynamic typing.

Looks a bit complicated, I know, but every piece is well-motivated. IMonadOps is fortunately the only interface that new monads must implement. Note the type constraints. The interface and the general monad both have a "phantom" or "witness" type M, constraining the type of the monad operations to the same IMonadOps type that created the monad. This means that the monad is entirely closed to extension and inspection.

Each IMonadOps is effectively a stateless singleton. The constraint to a struct is merely an optimization. Every IMonadOps implementor is actually very similar to the body of an ML module. If I want to invoke Identity.Zero, I can do it thusly:

default(Identity).Zero<int>();

This reveals the magic I used to define the Monad type. Whenever the Monad<M,T> type needs to invoke the underlying monad operations, it invokes the operation on the corresponding singleton M just like the above. Indexing the Monad by an IMonadOps implementation M, is like a linking step between Monad and M.

You can see the dispatching at work here. All monad operations are dispatched to the "linked" methods of M. In a sense, we have succeeded in abstracting over the concrete implementation of Monad.

Now comes the catch: it's not fully type-safe, because the encapsulated state of the monad must be stored as 'object'. This means that each monad body must ensure it properly casts to and from the appropriate type. This is again due to the type constructor abstraction limitation.

Your first thought might be, why can't the encapsulated state simply be 'T'? Well, if all you wanted was an Identity monad, then that would be fine. But consider the Maybe monad:

The encapsulated state is an Option of type T, not a T. As another example, the List monad encapsulates a list of T's. Since we can't abstract over the type constructor for the encapsulated state, we thus need to resort to dynamic typing.

Now comes a slightly bizarre part: what are the injection and projection functions for? Well, despite the fact that IMonadOps is the "internal implementation" of Monad, it doesn't have direct access to the monad's internals. Unfortunately, sometimes that access is needed. Consider the List monad:

ListM needs access to the private state of the returned list of Monads in order to flatten the list, but that access is not permitted since Monad is a separate, encapsulated type. There is no way to make this state available using inheritance or access modifiers, without also permitting the state to escape inadvertently.

Instead, the Monad provides an injection/projection pair, which are used to construct monad instances when given private state, or read out the private state of a monad instance, respectively. Note that encapsulation is maintained since this ability is granted only to the implementor of a given monad, which is already trusted with its own state.

I suspect there's a more efficient way to share the monad's state, but I'm a little tired from standing on my head all day for C#, so if anyone has any ideas, I welcome them. :-)

While this encoding is less efficient than the one described in my previous post, it's safer in some ways for users monad implementors alike, and I proved to myself that C#'s type system is powerful enough to encode the monad interface if you contort yourself appropriately. This technique for abstracting over type constructors might even be usable in my tagless Orc implementation.

Friday, January 18, 2008

I've found other monad tutorials very frustrating. They are typically written in expressive languages with type inference, which permits concise descriptions, but obscures the underlying type structure. I've been struggling with writing something close to a monad in C# for quite some time, simply because none of these tutorials give a sufficiently complete description of a monad's structure. Suprisingly, the Wikipedia page on monads helped clarify what I was missing.

-- return is a type constructor that creates monad instances return :: a → m a

-- bind is a function that combines a monad instance m a with a-- computation that produces another monad instance m b from a's-- to produce a new monad instance m bbind :: m a → (a → m b) → m b

So the monad type 'm', has a function 'return' that constructs instances of that type, and 'bind' which converts an instance of that monad into another instance of a monad by passing its private state to the provided function. For instance, meet Haskell's Identity monad:

What's not at all obvious from any of the above signatures, or any of the existing tutorials, is that the monad that the function f returns, must be the same monad type. If you're in the list monad, f must return a new instance of the list monad. If you're in the identity monad, f must return a new instance of the identity monad. This simple fact eluded me for quite some time, and explains why the monad interface cannot be expressed in C#. We can still program using monads, but the interface can't be enforced by the type system.

Note that all of these monads in C# share a common structure. They have at least one constructor used to wrap a value (called 'return' earlier), and they all have a Bind method which operates on the value encapsulated in the monad, and maps it to a new instance of the monad. Since we have such a similar structure, is there a way to declare an interface or an abstract base class declaring the signature for the Bind method?

Note the two emphasized sections: the class constraint on Monad, and the type parameter Identity provides to Monad when inheriting from it. They are both used without a type argument. This is illegal in C#/.NET, but it's perfectly legal in languages with more powerful type systems, such as those with "kinds". Identity<int> has kind *, while Identity without a type argument has type *⇒*, ie. a function that constructs a type of kind * when given a type of kind *. This is why monads are not so easily translated into languages like C#.

While quite simple, it's not as efficient as it should be since C#/.NET doesn't support proper tail calls. You can see in that source file that I derived FoldLeftDerived from SuperFold. Deriving FoldRight is a bit trickier, so I have to think about it. The simple, inefficient, answer is to simply reverse the list.

I'll probably remove SuperFold from my purely functional list, since I don't think it will end up being useful in C#. The iterated FoldLeft I defined is more efficient due to the lack of tail recursion, and the iterated version also supports early termination.

[1] It's very frustrating to see how close MS gets to a truly general and useful abstraction, only to lock it down for no apparent reason. What good reason is there for Nullable to be restricted to struct types? If you give it more than a second's thought, I think you'll realize that there is no good reason. Nullable is the option type if it weren't for this restriction!

Tuesday, January 8, 2008

As Chung-chieh Shan pointed out, my encoding of modules in C# is somewhat limited. In particular, I cannot abstract over type constructors, which is to say, C# is missing generics over generics. Consider the Orc.NET interpreter:

This is the result of my translation, which was necessitated by the "Where" method. Where introduces a dependency which currently cannot be expressed with ordinary C# constraints, so the module encoding is necessary.

The above interface is a direct, faithful implementation of the Orc semantics. The implementation I currently have is an interpreter for those semantics. What if I want to provide a compiler instead? The interface should remain the same, but the implementation should differ. This is a classic abstraction problem.

It's clear that the implementations can be changed at compile-time, but providing a different Orc implementation at runtime, something very natural for OO languages, does not seem possible. The reason is that C# lacks the requisite polymorphism.

A generic type is a type constructor, which means that from an abstract type definition, Exp<T>, you can construct a concrete type by providing a concrete type T, such as Exp<int>. But, if Exp<T> is not a concrete type until you provide it a concrete type T, what is the type of the abstract definition Exp<T>?

Reasoning about type constructors requires the introduction of something called "kinds". As types classify values, so kinds classify types. The set of values, or concrete types, is of kind *. The set of type constructors is of kind *⇒*, which is to say they are "compile-time functions" accepting a concrete type and producing a concrete type.

Now consider that we have multiple Exp<T> implementations, say Interpreter.Orc.Exp and Compiler.Orc.Exp, all with different innards, and all define the same operations and thus are theoretically interchangeable. We would somehow like to abstract over the different implementations of Exp<T> so that we can use whichever one is most appropriate. In other words, we want to make our code polymorphic over a set of generic types Exp<T>.

This necessitates type constructor constructors, the kind *⇒*⇒*, which accepts a type such as Orc.Compiler.Exp, and produces a type constructor Exp<T>.

At the moment, I can't think of a way to encode such polymorphism in C#. Functional languages provide this level of polymorphism, and some even provide higher-order kinds, meaning type constructors can be arguments to other type constructors, thus achieving entirely new levels of expressiveness.