Tuesday, August 26, 2014

Now that we have C++14, it has opened up doors for truly mind-bending uses of lambdas--more specifically--generic lambdas. This blog post is the third installment in the series of "Fun with Lambdas: C++14 Style". Check out part 1 and part 2 if you have not already.

This post is about "monadic tuples".

Monad--a simple but powerful abstraction, however, considered quite difficult to understand in the imperative circles. We will look into what's know as the "continuation monad". As it turns out, in C++14, you need just a couple of lines of code to create an instance of a continuation monad.

I'm fairly new to the world of monads. So, things did not begin with great clarity for me. It all started with an intriguing question on Stackoverflow. As it turns out the same "trick" is also used in Boost.Hana and discussed on boost mailing list here.

What you see below is more or less how I came to understand the idiom as an instance of a monad. Some background in functional programming may be helpful in reading this post. A good understanding of nested generic lambdas is a must. If you are wondering if you should read the part 1 first, then you probably should.

list is a generic lambda that accepts a variable number of arguments and returns a closure (an instance of the inner lambda) that captures the arguments by value. The inner lambda accepts a parameter (called access) that must be callable with an arbitrary number of arguments. The inner lambda simply expands the parameter pack while calling the callable. That way it provides "access" to the captured parameter pack.

If you squint a little, you will probably realize that list is like a constructor of a tuple. As a matter of fact, if you were to implement the inner lambda using a good old class template, you will most likely resort to using a std::tuple member.

head, tail, and length are examples of operations that you may perform on a list. head returns the first element, tail returns the list excluding the first element and length returns the size of the parameter pack. For example, a three element list is passed to the length lambda. As every list itself is a closure, it is called with an "accessor" function. The accessor simply does a sizeof... and returns the result, which propagates all the way out.

It is probably immediately apparent that this idiom adds life to otherwise drab variadic parameter packs. Don't get me wrong, variadic parameter packs are cool and we won't have other cool things like std::tuple without them. However, the point is that the language allows very few operations on a parameter pack. In general, you can't "store" them. Pretty much, you can expand a parameter pack, ask for its size, and unwind it using the car/cdr recursive style. And that's about it. Until now, To store a parameter pack you have to put it in a std::tuple.

But now there is an alternative. You can capture it using a lambda and provide access to it as done in the list lambda. As it turns out, this seemingly innocuous and perhaps needlessly convoluted approach to "accessing" parameter packs is phenomenally powerful.

WHY? ... the list lambda and the closure inside are special. Together, they form an implementation of a Continuation Monad.

A great introduction for continuation monad for C++ programmers is here. In essence, the list lambda above takes a value (a variadic parameter-pack) and returns a simple "continuator" (the inner closure). This continuator, when given a callable (called access), passes the parameter pack into it and returns whatever that callable returns.

Borrowing from the FPComplete blogpost, a continuator is more or less like the following.

The SimpleContinuator accepts one value of type A and passes it on to access when andThen is called. The closure returned by the list lambda is conceptually the same. It is more general. Instead of a single value, the inner closure captures a parameter-pack and passes it to the access function. Neat!

Hopefully that explains what it means to be a continuator. but what does it mean to be a monad? Here is a good introduction using pictures.

The inner closure returned by the list lambda is also a list monad, which is implemented as a continuation monad. Note that continuation monad is the mother of all monads. I.e., you can implement any monad with a continuation monad. Of course, list monad is not out of reach.

As a parameter-pack is quite naturally a "list" (often of heterogeneous types), it makes sense for it to work like a list/sequence monad, where operations can be chained one after another. The list lambda above is a very interesting way of converting C++ parameter-packs to a monadic structure.

The head and length lambdas above, however, are a bit disappointing because they break the monad and the nested lambda inside simply returns a non-monadic value (something you can't chain more operations to). There is arguably a better way to write a chain of "processing" operations as shown below.

Functor

Before we can say that the list lambda is a monad constructor, we have to show that it is a functor. I.e., fmap must be written for the inner closure. Note that "functor" is a category theoretic term. It has no direct correlation with a C++ functor (i.e., a function object)

The list lambda above serves as the creator of the functor from a parameter pack---essentially it serves as the "return" in Haskell. That created functor keeps the parameter-pack with itself (capture) and it allows access to it provided you give a callable that accepts a variable number of arguments. Note that the callable is called EXACTLY-ONCE.

I.e., given a function that maps a to a list-of-b and a list-of-a, flatmap return a list-of-b. Essentially, it takes each element from list-of-a, calls func on it, receives (potentially empty) list-of-b one-by-one, then concatenates all the list-of-b, and finally returns the concatenated list-of-b.

The count function is a monad-perserving operation because it returns a List of single element. If you really want to get the length (not wrapped in a List) you have to terminate the monadic chain and get the value as follows.

If done right, the collection pipeline pattern (e.g., filter, reduce) can now be applied to C++ parameter-packs. So lets try to do that.

You might have noticed that we're doing only one operation per line and giving names to each intermediate result (i.e., l1, l2, l3 etc). Naming the intermediate results is unnecessary but if we don't, readability of code goes out the window.

Lets try to rewrite the previous program where we print 1, 1, -2, 2, -3, 3.

The above code is pretty much incomprehensible and at this point you probably want to click away. But bear with me for just one moment. There's a pattern here and we can factor that out. I'm going to use C++ operator overloading so that the code looks significantly more readable.

Operator > accepts our special list as the left hand side argument and a function from a->b as the right hand side argument. It uses fmap internally. The Operator >= is similar but it takes a function that goes from a->List[b] and uses flatmap internally. Remember, both functions return the special list (monadic tuple).

Suddenly, you can read the program from left to right and all the fmap/flatmap boilerplate is hidden inside the overloaded operators. You are looking at a tiny Domain-Specific Language (DSL) for piping operations on collections. The chain can be arbitrarily extended to the right.

Although the above list lambda is provably a monad and shares characteristics of the proverbial list-monad, it is quite unpleasant to work with as a collection pipeline. Especially because the behavior of a common collection pipeline combinator filter (a.k.a where) does not meet common expectations.

The reason is just how C++ lambdas work. Each lambda expression produces a function object of a unique type. Therefore, list(1,2,3) produces a type that has nothing to do with list(1) and an empty list, which in this case would be list().

The straight-forward implementation of `where` fails compilation because in C++ a function can not return two different types.

In the above implementation, func returns a boolean. It's a predicate that says true or false for each element. The ?: operator does not compile because the types of list(i) and list() (empty list) are different.

So, a different trick can be used to allow continuation of the collection pipeline. Instead of actually filtering the elements, they are simply flagged as such---and that's what makes it unpleasant.

So far the discussion was about homogeneous tuples. Now lets generalize it to true tuples. Note that fmap, flatmap, where take only one callback lambda. To provide multiple lambdas each working on one type, we can overload them. For example,

P.S. Why is the order of output not the same across compilers? The order of variadic pack expansion is defined in the standard which corresponds to the original order of the pack. The order of evaluating function argument expressions is, however, not standardized. For example, checkout the implementation of fmap. func(z) is called as many time as there are arguments. However, the order in which multiple calls to func are evaluated is not guaranteed. As the calls to func print the values out to the console, the output is unpredictable across compilers. See more discussion on reddit/r/cpp.