Wednesday, September 22, 2010

The four stages of (functional) programming

There seem to be a series of stages of abstraction in functional programming - actually, I'd say in all programming, but it's more obvious in FP:

Stage 0: no functions, and thus, no abstraction

Stage 1: functions, but no higher-order functions

Stage 2: higher-order functions, combinator libraries

Stage 3: typeclasses, abstracting over combinator libraries

Stage 4: abstracting over typeclasses

Each stage lets you remove a class of code duplication. If you were to code in a stage 0 language, your only means of reuse would be copy/paste. Stage 1 introduces the ability to abstract over functionality, but in a limited way, since the structure of the computation is fixed. (I think many programs in a language like Java are stage 1 programs, or close to it.) Stage 2 introduces HOFs which removes a class of duplication by letting us abstract over control flow (this is where I put most Lisp/Scheme programs). From this we get the usual combinator library style. But at this point, we are still dealing with concrete types. Common patterns across these libraries are not abstracted over.

Stage 3 is the generalization of combinator libraries - code that is generic across multiple such libraries - think of the generalized combinator library we obtain from a monad, or a traversable. This removes the duplication found due to common patterns in collections of higher-order functions. I wonder - was Haskell's type system and commitment to purity a prerequisite to developing this stage of programming? In principle stage 3 is possible in any dynamic language with first class modules and first class functions.

And stage 4... well, we're only starting to see what stage 4 looks like. In stage 3 programming, we can write code polymorphic to any instance of the typeclass, but there could be duplication in the implementations of different typeclass instances - for instance, you might create a data type that is essentially a list of trees, but in giving the Applicative instance for your type, you start from scratch. In stage 4 programming, you'd assemble your type and appropriate typeclass implementations from combinators. This is done a little bit here and there in FP - you might compose Applicatives or take their product, but in stage 4 this style is pervasive and taken to its logical limit.

Stage 4 requires some serious rewiring of your programming brain. Forget concrete types or concrete type implementations. You don't implement a tree datatype. You assemble the type and all its useful functions from a catalog of typeclass combinators. Category theory would become an essential tool for programming in this style ("oh, there's a natural isomorphism between this type and blah, just gotta establish this isomorphism to get all that functionality"). Would we then see category theory and this style of programming taught as part of the standard computer science curriculum? Maybe...

I don't know of any any language that supports stage 4 programming adequately. It seems to require, among other things, first class and higher order modules. Probably more powerful type systems, too. Ed Kmett has an interesting language he's been working on, called Kata. Based on what he's told me about it, it looks specifically designed to make stage 4 programming easy. Interestingly, I think he's given up on static typing.

Hi Paul,Very nice read! It is interesting seeing shape up your thoughts, going from "pull libraries" over to "four" levels of programming. I would't have called the post "THE four levels" because, As others suggested there might(will?!) be more. I'm looking forward to your promised example on level 4!Regards Andreas S.

@Joeri - datatype generic programming is not totally what I have in mind for stage 4, but maybe it is just approaching it from a different angle. Generic programming seems more about deriving typeclass implementations from the internal structure of the datatype. I don't think this is the full story. This is a very vague thought though...

@Tom - I think parametric polymorphism is more of an enabler - it is what makes HOFs so useful. By itself, it is not a new level of abstraction. Likewise, higher-kinded types are an enabler for stage 3 programming - but it's the generalized combinator libraries that form the new level of abstraction.

I don't give up static typing, completely. I just use it to prove a very different set of properties and guarantees than is provided by Hindley-Milner-style type systems. This lets me avoid the difficulties of "hill climbing" in the solution space near HM, which is important since HM provides such a good price/performance trade off that few minor variations provide objective improvement in any one area without disproportionate costs in others.

@Paul Chiusano - I disagree, parametric polymorphism is an important level of abstraction independent of HOFs. Java has had it since 1.5, and this alone allows to type-safely avoid lots of code duplication (List<A> instead of IntList, StringList, etc.), even in the absence of HOFs. So I would insert parametric polymorphism at level 2.

While the first 3 stages are fairly general, the last 2 are specific to Haskell and languages that borrowed type classes from Haskell.Module systems and higher order modules (such as those present in ML variants) are I would say more general and better examples of the last 2 stages (given my interpretation of what you were saying, of course).And dependent types are an even better (and again more general) example seeing how with dependently typed records you get such a module system for free.

In short I use them to check that your definitions are consistent, but I provide little machinery to ask about provenance.

This gives you enough to use OOP as a code reuse mechanism, but the design deliberately removes the ability to ask certain questions that let you violate parametricity (such is "is this object an instance of a given class"?). This makes it harder for code behind the scenes to violate Liskov substitutability, and favors the needs of the library maintainer allowing them to refactor many ways without regard to the downstream consumer. When you combine that with a language that is lazy and immutable where 'new' ties a knot, a lot of the issues with multiple inheritance drop away, and since you can't ask 'isa?' questions I can include language features that would make that question considerably more expensive.

The net result is that you get a system that it is easy to express small incremental refinements in laws or in code reuse, which lets you build, say, a deep algebraically motivated numerical tower without boilerplate, and without the white box inheritance issues of traditional OOP designs or the 'abstraction sweet spot' issues required by the 'interface'-like design of Haskell class hierarchies, which place a lot of burden on people who instantiate classes that are deep in a class hierarchy.

It isn't perfect, but I think it is an interesting point in the design space.

I think SRI SPECWARE was aiming at Level 4. While there are a few papers and user manuals available, it's a pity they never released anything about their internal algorithms (e.g. limits/colimits), or open sourced any of the code - it's not like the project is going anywhere now.