Thursday, January 19, 2006

Eleven Reasons to use Haskell as a Mathematician

As a mathematician, what programming language should you use when you need to do anything algorithmic? For some specific domains there are languages tailored for the task, for example Macaulay2 is great for some types of algebra. For general problems many turn to Mathematica or Maple. But although they are universal (in the sense of Turing) they're not great platforms for general purpose algorithms and can perform very poorly unless the feature set of these languages just happens to marry well with the problem you're trying to solve. These tools are also like the proverbial sledgehammer - you're using a giant engine to solve what can sometimes be a simple problem and you'll pay the price in terms of memory and CPU usage. If you want a general purpose programming language the answer is simple. There are eleven reasons why you should use Haskell. So if you're a mathematician who occasionally needs to write code, get started on a tutorial right away!(1) Haskell is Functional

Functions are the bread and butter of mathematics so it makes sense to use a programming language in which functions are first class objects. It's not at all unusual for a mathematician to take a function of two arguments, say f(.,.), and construct from it a function of a single variable defined by f_x(y) = f(x,y). Computer scientists call such a thing a closure - it's a function constructed by supplying values for some, but not all, of the arguments of another functions. It's one of the most natural operations in the world and yet it's actually quite horrible to construct such a thing in languages like Fortran, C or C++. It's slightly cleaner in C++ because you can use objects and templates to simulate closures, but the syntax is still pretty hideous. (I know there are libraries that make this even easier - but they do so by implementing a functional language as a 'domain specific language' within C++, essentially making my point for me.) Here's how you do this in Haskell: you write "f x". The way it works is quite neat. Given sets A, B and C there is a natural bijection between the set of functions AxB->C and the set of functions A->C^B. So rather than write functions of multiple arguments Haskell function always take one argument and return the function that operates on the rest of the arguments. At first this might seem like a bad idea because functions of multiple arguments are now, in some sense, a second class object. But it's not a problem at all. To apply the function f to the arguments 1 and 2 we simply write "f 1 2".

To get an idea of how horrible it can be to work with functions without functional programming consider the code in Numerical Recipes in C. Most of their code is 'functional' in character in the sense that it is intended to perform operations on functions. But look at how you have to make use of horrible void * pointers to pass in those extra arguments to simulate closures.(2) Haskell Insulates You

As a mathematician you're more interested in solving the problem than dealing with the idiosyncracies of the hardware and issues like memory management. Haskell does a great job of insulating you from the underlying hardware so that you don't have to worry about these things. Languages like Java aren't too bad at this either. C++ is terrible, if you decide you don't want to manage memory yourself you can use shared_ptr<> objects and turn your code into a horribly unreadable mess.

(3) Haskell Performs Well

It's surprising how well Haskell peforms. You might think "if every time I add a to b it has to construct a temporary intermediate object that represents the function that adds a to a number and then apply that function to b it's going to be slow". But that is not the way to look at things. Haskell doesn't simply execute code by working top to bottom, left to right through your source code. The compiler (itself written in Haskell) is pretty sophisticated and frequently performs comparably to C. It's generally not as fast as C, but it's usually in the same league. The Haskell compiler rips apart your code and reconstitutes it as an executable in ways that are quite surprising. In Haskell you can often write code using idioms that feel natural, but are generally considered slow in other languges, without performance impact. One example is in list handling. Often you want to return a temporary list of objects and perform some operation on them. In many languages there is an overhead in making such a list, only to take it apart again when you want to operate on the actual elements so you instead have to restructure your code so that no list is constructed. In Haskell the compiler will often figure out that your list is temporary and do the restructuring behind the scenes for you.(3) Haskell is Declarative

Mathematicians like to spend their time saying what is the case rather than thinking in terms of sequences of operations. If b = f(a) and c = f(b) then it doesn't matter if you say c = f(b) and then b = f(a). Order is irrelevant. In an imperative language like C it matters to a large extent what order your code is written in because a statement like b = f(a) is an instruction. In Haskell you aren't giving a list of instructions so much as stating what is true. This fits much better with the way that mathematicians work. Haskell can take care of some of the details of precisely what order the operations are going to be performed in and hide those details from you.(4) Haskell is Lazy

This goes hand-in-hand with Haskell being declarative. Haskell only computes what it needs to compute in order to get the answer you require. It tries not to do unnecessary work. Languages that aren't lazy like this are called 'strict' languages. In a strict language, say, if you required the first component of the cross product of a vector, you would typically compute all three components of the vector and throw away the second and third components. If your code needed to be efficient you'd probably end up writing special purpose code to compute just the first component of the cross product. You'd be unable to reuse your cross product code. Mathematicians don't want to do this. They want to say that x is the first component of the cross product because it's true statement and not worry about what exactly gets implemented when. This laziness is ubiquitous throughout all of Haskell.

One place that laziness really pays off is in working with lists. Haskell has absolutely no problem working with infinite lists as long as you don't actually try to look at all of the elements. Instead of evaluating all of the elements it simply retains enough information to construct as many elements as are needed. Computing the elementwise sum, say, of two infinite lists is completely straightforward. And performing computations with power series, say, where the later elements are functions of the earlier elements, is absolutely straightforward. You don't have to check to see if you've computed enough of the earlier elements to compute the later one. Haskell does all that for you. If you want to see the first ten elements of the final infinite list you can just ask for them. You don't need to decide in advance how many elements you want and build that into the code from the beginning.(5) Haskell is good at Datastructures

Haskell is good at dealing with datastructures. Often mathematicians don't realise that they are using datastructures all the time. But they are. For example a group is a set equipped with functions to perform multiplication and inversion as well as a singled out element, the identitiy. Haskell makes it very easy to work with structures that are tuples of objects like this. It's also very easy to build one datastructure out of another and even specify rules about datastructures. For example a matrix is a two-dimensional array of objects. But if the objects form a ring then so does the matrix. This kind of statement is easy to express in Haskell making it well suited to working with algebraic structures. If you've defined formal power series over a ring and you've defined the ring of polynomials over another ring it's trivial to build the ring of power series over polynomials over the integers, say. What's more, the compiler can tell you when the structure you're trying to create doesn't make sense, for example if you were to try to multiply two matrices whose underlying elements can't be multiplied. A programming language like Mathematica is very poor at this as it doesn't really distinguish between datatypes. You'll only discover your error when the program is actually running and it tries to multiply the underlying objects. Haskell also uses type inference. You don't have to explicitly tell it the type of something because it can often figure it out for itself. If you see the number '1' in a textbook what does it mean? Is it an element of N, Z, R, Q, R[sqrt(2)], C or some other completely different algebraic structure? It could be any of these things depending on the context. The same is true in Haskell, it'll figure out what you mean by '1' using completely unambiguous rules. This means that you don't have to use a different symbol for '1' in different contexts, and yet you don't have to laboriously state what you mean every time you use it.

And rather than having an ad-hoc system of datastructure types Haskell's is algebraic.(6) Haskell is Compact

Not in the topological sense. Haskell code tends to be much shorter than its counterpart written in other languages. There is little extraneous verbiage and the verbiage that does exist is very powerful. This is more than a space saving exercise. Consider the history of calculus in n-dimensions. When Maxwell first wrote down his equations of electrodynamics they took up most of a page. WIth the introduction of vector notation they were simplified. By using the notation of exterior calculus they were simplified to dF = 0, d*F = *J. This isn't just typographically simpler but conceptually simpler. You can see immediately how you might generalise to n-dimensions instead of three, you can see trivially that d*J=0 from d^2 = 0 and you can now perform countless computations in a few lines that previously would have taken dozens of pages. What's more the new notation makes it harder to accidentally write equations of motion that don't make physical sense because they automatically obey symmetries that are difficult to see in more complex notation. Haskell is just like this too. You can do a lot with a tiny amount of code. And if two blocks of code appear to do similar things you can often factor out the common part and make it a separate block of code in a way that's much harder than with just about any other language. This makes it much easier to generalise your code and have insights about it. This works well with the way mathematicians like to think about things.

(7) Haskell is Mind Expanding

Haskell draws on ideas from interesting branches of mathematics, for example Category Theory and Logic. And the logic part doesn't appear the way you expect - it appears via the Curry-Howard 'isomorphism' in the type system used to define datastructures. As mentioned above, Haskell code lends itself well to generalisations. You'll find all sorts of commonality between things that previously didn't seem similar, and you'll find ways to express that similarity in your code. One of my favourite examples is by Ralph Hinze. He doesn't just illustrate thesimilarity between binomial tree and binary arithmetic, he actually shows how merging binomials and adding binary numbers are actually instances of the very same thing. Haskell encourages this kind of abstraction. Though I've wandered into the realm of computer science with talk of trees this is still the type of abstraction that mathematicians find interesting.(8) Haskell Has No Side Effects

When you add two numbers you get the sum. You don't find that something else in the universe has happened as well (except maybe the temperature in the room goes up if it's a really hard sum). So functions in a programming language shouldn't have any side effects. In Haskell afunction does nothing other than return the value of its result. This means that it's easy to bring your mathematical experience to bear when analysing programs. (Almost) all of the usual rules for manipulating mathematical expressions apply so that, for example, if b = f(a) and c =g(b,b), then c = g(f(a),f(a)). In many other programming languages you'd have to make sure that calling f twice didn't have some side effect that made it behave differently to calling it once.(9) Haskell Eats Circular Definitions for Breakfast

In most imperative languages a line of code like x=f(x) means replace the value of x with f applied to the old value of x. In Haskell it means x equals f(x). It has no problem with you defining a variable in terms of itself and x=f(x) means that x is a fixed point of the function f. In practice it's not so simple, you can't just write x = 1-x^2 and expect it to figure out that x is the golden ratio. Haskell is a programming language not an equation solving package. But in my formal power series library I'm able to use a line like "e = 1+integrate e" to define the power series for exp x. See here for another example of using circular definitions to solve differentialequations.

(10) Haskell is Well Founded

What I mean is that Haskell is built on fairly solid mathematical foundations in lambda calculus and proof theory. It's not just an ad hoc language that has accreted features over the years. Everything is there for a reason and people have thought long and hard about the theoretical underpinnings of what went into it.(11) Haskell is Pretty

Well, that's a matter of taste. But I find it very elegant.

That's 11. I think there's a rule against making lists with prime numbers of elements but I'm ignoring it.

To be fair I ought to point out that it's not all a bowl of cherries. Haskell isn't very good at dealing with arrays. In most programming languages it takes time O(1) to access the ith element of an n element array. In Haskell you can get this kind of performace if you don't mind using the various painful to use array operators available, or you can sacrifice performance, keep the elegance, and get O(log n) performance. On the other hand, you'd be surprised just how many useful matrix algortihms don't require random access to the ith element.

Because Haskell functions don't have side effects it doesn't make sense to perform an operation more than once because it'll just do the same thing each time. So there's no notion of a 'for' loop (well, maybe there is) in Haskell and instead you must find other ways to express iterative operations - ie. using 'higher order functions' or recusrion. But sometimes you feel like you just want to write the loop and be done with it. There's always another way of expressing the problem but when the iterative approach is the first thing to pop into your head it can occasionally be frustrating. And sometimes you'd like side effects.

Haskell comes with a library of functions called the 'standard prelude'. Unfortunately, whoever designed it knew just enough mathematics to be dangerous and made a complete hash of it. Instead of using a well known hierarchy of algebraic structures such as group->ring->field they defined all kinds of bizarre structures starting with something like a ring with a norm or valuation. And while the library supports complex numbers it's not flexible enough to support the Gaussian integers (probably a consequence of forcing rings to have a norm). All things that could have been implemented correctly in a few seconds if a mathematician had been consulted early on.

And the last caveat. I found learning Haskell hard. Let me put it in perspective: I'm 39. I've been programming since I was 11. I've been writing recursive code since about 15 and I've been using whatever approximations to functional programming are avalable to me in any language I use for almost that long (including torturing myself with C++ template metaprogramming). I've even implemented a couple of declarative languages, Prolog and SASL, the latter being a functional lazy language a lot like Haskell that I thought I'd investigate before Haskell. And I still found Haskell tricky to learn. But it was worth it. And I believe that the trickiness of the language is the kind of trickiness that mathematicians should be able to handle. (And it probably helps if you don't know any C or C++...)

A friend of mine, and I, have been writing all kinds of mathematical code and finding Haskell well suited to the task. Logic (eg. building analytic tableau as in Smullyan's book on first order logic), Group Theory (eg. the Schreier-Sims algorithm), Calculus (automatic differentiation), Combinatorics (via formal power series), Algebra (eg. Grobner bases) and now we're looking into investigating relationships between elliptic functions and modular forms. (Hmmm...my friend takes most of the credit here.) Haskell is great at all of this stuff even though it's not in any way an algebra or mathematics package.

And one list thing about Haskell. If x is an integer (called an "Integer" in Haskell) then x+1>x. It's nice to work over the integers rather than a cheap and nasty 2-adic approximation (though the cheap and nasty approximation is there if you want it.)

This post is a very good resource for those interested in learning "those" details about programming languages that can make the difference and avoid missing a lot of time in the search for the appropiate one.

I've been teaching Haskell to anyone I can. With my student the other day, we wrote a screenful of one-liners (leavened by comments and type declarations) that reverse-search atomic lattices represented as bitfields. The result: Code that could find all instances on 5 atoms in a quarter second (Macaualay2 could only handle 4 atoms) and all 2^36 or so instances on 6 atoms in 14 hours. That's an instance per bit on my 8 GB linux box, all the while using a memory footprint measured instead in MB.

So Haskell can mix it up at the machine level with the best of them. With its lazy functional overhead, Haskell is often slower than C, step by step, but its conceptual clarity allows one to effortlessly combine steps to form algorithms that would be way too painful to write in procedural languages. People mired in language efficiency debates haven't moved on yet; I don't bother with the mere constant improvements of strictness annotations or overclocking my hardware, I instead look for better algorithms.

How efficiently does a language mind-meld with its user? That's the whole ball of wax. I'm in my 50s; if one could reject the language debate static and accept this principle in one's 20s, one would be way ahead.

As an engine author, I'd reverse your "giant engine" comparison. All languages and packages pick their engine and tune it. Macaulay2 optimizes Grobner bases, Prolog optimizes unification. Lisp optimizes the lambda calculus, while logicians have since moved on to merge the lambda calculus with type and category theory. This is the engine that Haskell optimizes; look under the hood of GHC Haskell, and one sees the biggest engine of them all, written by the best programmers of them all. Haskell made a better choice of domain for its engine.

Oh yeah, Integers are easy to use, but 32-bit Ints pervade the basic libraries. We're lucky we wrote "length64" on a hunch before our 14 hour run; the standard version would have overflowed.

I think I should adopt the rule of always releasing my first draft. It may have a few typos but that's fine. When you go back to fix typos you start working locally rather than globally and you introduce structural errors. Have two threes is a good example of that :-)