As I explore Haskell, I’m discovering that one of its trickiest aspects is not structuring things functionally, but the lazy evaluation. It turns out lazy evaluation comes with both great benefits, and significant difficulties. I’d like to point a few of these out, as they’re becoming clearer to me.

Benefits

One of the great benefits of lazy evaluation is that your code doesn’t need to account for the scale of an operation. Let’s take a simple example: checksumming a large file whose contents are being read, on-demand, over HTTP.

In C++, if I wanted to checksum a large file read over HTTP, I couldn’t buffer it memory because I don’t know how large it might get. Nor do I want my checksumming code to know anything about HTTP, or where the data comes from. The answer here is to use I/O streams. By passing a generic istream interface around, I can hide any knowledge of where the data came from. The checksumming algorithm just reads data from the stream as it needs to, and the HTTP layer downloads more bytes as required (typically caching to avoid constant network access).

However, there’s a downside to this: the checksumming code now knows something about I/O. In actuality, a checksumming algorithm only cares about the bytes being checksummed and little else. It shouldn’t have to know about I/O, or strings, or any of the details of where data comes from or how it’s structured. It should ideally receive a pointer to an arbitrary large sequences of 8-bit bytes, and return a fixed size checksum representing a fingerprint of those bytes.

Yet this naive approach can’t really be done in C++. If it were a file I was checksumming, I could memory map the file and pass around a byte pointer, and the OS would take care of lazily reading in the bytes for me as needed. But for a file being accessed over HTTP this would require first downloading the file and then checksumming it, when I specifically wanted to “checkum as I go”. Who knows, maybe I’ll discover a reason to stop summing beyond a certain point and I’d like to stop downloading at that point as well.

Well, just as memory mapping gives me lazy access to the contents of a file, a language with lazy evaluation gives me lazy access to the results of any algorithm – including downloading data over HTTP. With Haskell, I can indeed write my checksum algorithm as if it receives a giant byte buffer, and the language takes care of downloading only as much data as I’ve accessed (plus caching). This simplifies my checksumming code, and reduces the amount of knowledge that has to be passed around, such as a “stream” as opposed to a generic, 8-bit pointer.4

This simplification lets you design your algorithm as if in an ideal world. You want to process a bunch of numbers? Work on a list. What you say, the numbers are coming in from a socket and you don’t know when it will end? Doesn’t matter, just work on a list. In C++ I’d have to switch from passing a vector to passing an istream iterator, but in Haskell, I don’t care what algorithm is populating my list, only that it is a list, and that I know how to work it.

Detriments

For all its beauty, laziness has three costs I’ve run into so far. The first is that it lets you very easily write functioning algorithms with horrible performance characteristics. This happens because laziness causes a promise5 to be constructed, which takes memory and time to do. Sometimes, the cost of the underlying operation is far less than the memory cost of carrying a promise around to do that operation at a later point. This isn’t true of a slow operation like reading from a socket, but it’s certainly true of something trivial like summing two integers. It means one has to be aware of promises, when they’re constructed, and when it’s more beneficial to force evaluation always versus the benefits to be had from deferred a computation whose result may never be needed.

The second is that when a poorly performing algorithm dies, it dies when its value is used, not when the promises are made. This can make it look like the consumer is to blame, when really it’s the producer. Here is a trivial example:

mysum = foldl (+) 0
main = print (mysum [1..1000000])

Although foldl is tail-recursive, so we aren’t blowing stack through recursive calls, it still blows stack because it builds up a huge, nested structure of promises that only gets evaluated once print is called to render it as a string. That is, the return value from mysum itself is no problem, it’s just a lazy computation against a large list. But then print needs the result, so it asks mysum to fulfill its promise. This in turns causes mysum to churn through the large list of integers, building up the return value as it goes.

However, and here is where the surprise comes in: foldl doesn’t actually compute those values as it walks the input list. No, even these are done lazily, because it can’t know how many of those values will actually be needed. We may know from looking at the code that it will need them all, but it doesn’t know. So it constructs something on the stack looking like this:

((((((((0+1)+2)+3)+4)+5)+6)+7)+...)

And so on, all the way to the last integer. Only when mysum is done constructing promises across the entire input list, and the promise structure is returned, will it actually get evaluated by summing the integers together and finalizing each promise. If you pick a input list large enough, there goes available memory.

The trick here is that the stack fault won’t ocur in foldl, or in mysum. It will occur in print, where the need to resolve the promise result in the call to mysum actually being made, which then calls foldl, which then starts building thunks until memory is gone. In this trivial example there’s very little code or time distance between the problem and its cause, but in real world code there may be enormous gaps between them.

In consequence of this I learned that it’s hard to get GHC to produce stack traces for you when there’s a runtime error. Your code can be going on its merry way, when suddenly there’s a stack fault. But that’s all you see: a stack fault indicating something went wrong. Where did it go wrong? Based on the behavior of the program, I’m led to believe it happened near what the code was actually doing6 – but in fact the problem may have started long, long before, except that laziness differed the trigger to a later time.

So, even though laziness can delay costs and abstract how data is determined, by the same taken it also delays errors and abstracts blame. In C++ if I pass in an I/O stream and there’s a crash reading from it, I know to look at my stream code. But in Haskell if I get a stack fault simply by processing a list, how am I to know what’s wrong? It’s not going to be in the List code, and probably not in the code walking the list, but in code which promised to produce the list potentially a long time ago.

I still think the benefits can outweight the difficulties – especially when it comes to parallelism, and avoiding unnecessary computations, and allowing code to safely traverse infinite series – but it definitely requires a level of algorithmic conciousness on the part of the engineer which seems quite a bit higher than with imperative languages.

And if I do have to include state with this raw, lazy data, but I don’t the algorithm to know anything about it? That’s where the Monad steps in. Say instead of checksumming a file, I’m parsing an expression. There are a lot of details that go along with parsing that have little to do with interpret the next bit of text, such as token position, error context, backtracking information, etc. I want to be able to write a routine that parses a number very simply, without knowing about all those details. It’s the Monad that manages this extra information. You can read more here. ↩

Promises are what get turned into real values when data is finally needed. ↩

If you use profiling libraries along with -prof -auto-all, you can get a much clearer picture of what was executing at the time of the fault. ↩

I’ve been reading Read World Haskell now, after having finished the delightful Learn You a Haskell Tutorial. I’m up to chapter 6, about to dive into Typeclasses. In the meantime, I’ve picked a toy project that also has a taste of usefulness: a script to convert the Hackage database into MacPorts Portfiles, respecting inter-package and external library dependencies. I call it HackPorts, of course.

Requirements

This translation should require two things:

The Cabal package, for read information about all packages known to it. This avoids writing a custom parser, or using HTTP to crawl the online Hackage database.

A mapping file of external dependency names to MacPorts port names. This is for dependencies on things like libbz2, where the script will need to be taught how MacPorts names that library. This is likely to be the most labor-intensive step, having nothing to do with Haskell.

Initial experiences

Haskell makes a concerted point about separating “pure” from “impure” code. Anything which talks to the outside world, such as reading and writing files, is impure. Anything which can be expressed in terms of standard data types – or compositions thereof – is pure.

Take for example a program to count lines in a file. The pure part of the code receives a giant string, splits it into lines at line boundaries, counts those lines, and returns an integer. The impure part takes a command-line argument, interprets it as a FilePath (an impure type, since it must concern itself with operating system-dependent naming conventions), and reads the contents of the file at that location. The program flows by passed the file contents as a string to the pure code, and receiving an integer to be printed on the output device.

This division into pure and impure has an interesting side-effect (no pun intended): Most of a program’s code is written in isolation of its context of usage. Take Cabal, as a case in point here. Part of Cabal deals with downloading information from the Web, reading and writing package files, and executing external commands, like make. But another part of Cabal is concerned only with the structure of package files, and determining the total set of dependencies required for building a package. These latter details can be discussed in complete isolation from what is done with that information.

As a result – and I’m not sure whether the Cabal authors designed it this way or not – Cabal is naturally part “program”, and part API. I was able to start taking apart package files almost instantly, with extremely little code. Here’s a toy program to print out a package’s maintainer, if given the path to a .cabal file:

What excites me is that Haskell uniquely encourages the separation of alogrithm and application – the isolation of context-dependent knowledge into as small a region of a program as possible.

Too many times I’ve tried to use a utility’s code as a “library”, only to find it was so caught up in its idea of how it should be used, it had never bothered to abstract its core principles into a set of “pure” function, independent from that intent. This happens, for example, with the version control system Git. Although many have wanted a libgit.a for accessing Git’s data structures directly from other languages, yet none exists. One is forced to either shell out to the git command, or write another implementation to interface with the “pure” side of what Git does.

Recently I changed how the content on this site was generated, from using the standalone OS X application RapidWeaver, to the server-side publishing platform Movable Type. During that transition I changed the site’s style to the minimalist default offered by MT, which uses its own CSS for column layout and typography.

Tonight I finally got around to switching the site back to blueprint-css, which I very much prefer. I used the superb application CSSEdit to help me massage Movable Type’s style into something that compatible with Blueprint’s own typography and layout.

I hope the result is pleasing. If anyone sees strange artifacts or display issues, please let me know. I’m aware code examples were being truncated on the right side before, but this should be corrected now. More on Haskell to come soon!

Today I need a wrapper script to drop arguments from a command-line. I instinctively reached for bash, but then thought it would be a good exercise for my infant Haskell knowledge.

The task

The task at hand is to write a wrapper script for /usr/bin/ld that drops arguments beginning with -Wl,-rpath,. Since it must deal with arguments containing spaces, and I didn’t want to get into executing external programs with Haskell just yet, I wrappered the wrapper:

#!/bin/bash
$(dirname $0)/ld-wrapper "$@" | xargs -0 /usr/bin/ld

Here ld-wrapper is expected to return its arguments separated by NUL characters so I can feed it to xargs, and from there to /usr/bin/ld. I’m sure there’s an easy, all-in-one way to do this with Haskell, I just haven’t reached that chapter yet.

Everybody talks about Monads when they mention Haskell, so I got a bit ahead of myself and wanted to see something of what they’re about. No, don’t worry, I’m not aspiring to yet another Monad tutorial. I feel I have a ways to go before I’m ready to craft my own light-saber.

I did read about 10 Monad articles on the Web, and found myself more confused when I came out than when I went in. Today’s exercise took about 5-6 hours of pure frustration, before a kind soul on IRC finally set me straight. It sure is difficult when getting past a single compiler error takes you hours.

That bedeviled cat

Most geeks know about Schrödinger’s cat, the fated beast who, when put into a box with a random source tied to a deadly gas trigger, remains in a state of quantum superposition in which he’s neither alive nor dead until someone opens the box to look.

Well, people kept saying that Monad are like “computational containers”, so I wanted to model the following:

There is a Schroedinger Monad into which you can put a Cat.

When you create the Monad, it is Unopened, and the Cat’s has no state.

You also pass in a random generator from the outside world. This involves another Monad, the IO Monad, because randomness relates to the “world outside”.

As long as you don’t use the monad object, the Cat’s is neither Dead nor Live.

As soon as you peek into the box, or use it in any calculation, the Cat’s fate is decided by a roll of the dice.

When I run the program ten times in a row, here’s what I get:

Opened (Live (Cat "Felix"))
Opened Dead
Opened Dead
Opened Dead
Opened Dead
Opened (Live (Cat "Felix"))
Opened Dead
Opened Dead
Opened (Live (Cat "Felix"))
Opened (Live (Cat "Felix"))

Let’s look at the code, and where I had troubles writing it.

A flip of the coin

The first function flips a coin and returns True or False to represent Heads or Tails:

The sugar fst $ random gen is just shorthand for fst (random gen). There is no difference, I was just playing with syntax. You do need to pass in a valid random generator, of type StdGen, for the function to work.

Cats

These two types let me make Cats out of Strings, along with a Probable type which models a Live thing or a Dead thing. It treats all Dead things as equal. I can create a Live Cat with:

felix = Live (Cat "Felix")

Following my “fun with syntax” up above, I could also have written:

felix = Live $ Cat "Felix"

It doesn’t matter which. The $ character is the same as space, but with much lower precedence so that parentheses aren’t needed around the argument. If there were no parens, it would look like I was calling Live with two separate arguments: Cat and "Felix".

Flipping a Cat

When I have a Cat, I can subject it to a coin toss in order to get back a Live Cat or a Dead one. I should probably have called this function randomGasTrigger, but hey.

The type of the function says that it expects a random generator (for flipCoin), some thing, and returns a Probable instance of that thing. The Probable means “can be Live or Dead”, according to how I defined the type above. The rest of the function is pretty clear, since it looks a lot like its imperative cousin would have.

Bringing in Schroedinger

data Schroedinger a
= Opened (Probable a)
| Unopened StdGen a deriving Show

This type declaration is more complicated. It creates a Schroedinger type which has two data constructors: an Opened constructor which takes a Probable object – that is, whose Live or Dead state is known – and an Unopened constructor which takes a random generator, and an object without a particular state, such as a Cat.

In the third case, the idea is that his fate will be determined by the random generator created with mkStdGen 100. However, I want a real random source, so I’m going to get one from the environment later.

Here comes the Monad

As complex as Monads sound on the Web, they are trivial to define. Maybe it’s a lot like binary code: nothing could be simpler than ones and zeroes, yet consider that all complexity expressable by computers, down to video, audio, programming languages, and reading this article, are contained within the possibilities of those two digits. Yeah. Monads are a little like that.

This useless Monad just illustrates how to define one, so let’s cut it apart piece by piece. By the way, I didn’t author this thing, I just started it. Much of its definition was completed by folks on IRC, who had to wipe the drool from my face toward the end.

instance Monad Schroedinger where

Says that my Schroedinger type now participates in the joy and fun of Monads! He can be discussed at parties with much auspiciousness.

Opened Dead >>= _ = Opened Dead

The >>= operator is the “bind” function. It happens when you bind a function to a Monad, which is like applying a function to it. This line says that if you apply a function to an Opened box containing a Dead thing, what you’ll get back is an Opened box with a Dead thing.

Opened (Live a) >>= f = f a

If, however, you bind a function to an Opened box with a Live thing, it will apply the function to what’s in the box – in this case, the Cat itself. The function f is assumed to return another instance of the Schroedinger type, most likely containing the same cat or some transformed version of it.

Unopened y x >>= f = Opened (flipCat y x) >>= f

Here is the meat of this example, it’s reason for being, all contained within this one line: If you bind a function to an Unopened box, it gets bound in turn to an Opened box containing a Cat whose fate has been decided by the dice. That’s all. The reason I used a Monad to do this is to defer the cat’s fate until someone actually looked inside the container.

return x = Opened (Live x)

Lastly, if someone returns a cat from a box, assume its an Opened box with a Live Cat. I don’t honestly understand why this is necessary, but it seems Opened Dead cats are handled by the binding above, as shown by the output from my program. I’ll have to figure this part out soon…

The main function

This is fairly linear: it gets a random generator from the operating system, then creates an Unopened box and returns it, which gets printed. print does its work by calling show on the Schroedinger type, since it was derived from Show earlier.

Something I still don’t understand: at exactly which point does the flipping happen? When box is returned? When show gets called? Or when print actually needs the value from show in order to pass it out to the IO subsystem?

Closing thoughts

The full version of this code is on my server. There is also a simpler version without Monads. I worked on the Monad version just to tweak my brain. At least I can say I’m closer to understanding them than when I started.

Having just begun my descent down the rabbit hole, I thought I’d try journaling about what I discover along the way, so that those who are merely curious can play the part of language voyeur. I’ve always wanted to do that: to see how someone dives into Erlang or O’Caml or Forth – or Haskell. Here’s your chance.

This is day 5 of the Haskell experience, and I’m having quite a bit of fun so far. It’s definitely twisting my head into pretzel shapes. I’ve spent hours getting less done than I could achieve with Python in moments. The hope is that all this retraining will pay off further down the road.

Fibonacci

My first attempt was a Fibonacci function, which I failed at miserably. Turns out I was unable to conceive of “lazy recursion”. When I looked up the answer, it just seemed beautiful:

fib = 1 : 1 : zipWith (+) fib (last fib)

This function starts out the list with 1, followed by 1, then it starts adding two lists together – provided by the same function before it’s even done! In imperative land this would blow the stack in a heartbeat, but in Haskell it makes sense. The recursive call to fib returns 1, 1, 2, 3, 5 and the recursive call to last fib returns 1, 2, 3, 5. Add them together, and you get the sequence.

There is also the traditional definition, which matches what you find in math textbooks:

fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

If evaluated at the interactive prompt, this function will generate numbers forever, so you have to ask for just a few, like the first 20:

take 20 fib

So, things began with my face on the ground, which was humbling, but also refreshing that such a simple problem could floor me so easily.

Splitting strings

The next problem I tried to tackle was splitting a string into substrings at each colon. That is:

"Hello:World"
=> ["Hello", "World"]

Again, fail. How shocking it was to spend over an hour on this and ultimately have to resort to Google. The answer was pretty straightforward:

What I missed was using an accumulator to collect the current string. I kept thinking it was something I had to return as I went along, not passed down to each deeper level – and then returned after I’d added to it. Here’s the breakdown:

splitAtColons :: String -> [String]

Defines the type of the function as something which takes a String and returns a list of String.

splitAtColons = sac' []

This is essentially what I missed. The definition of splitAtColons calls a sub-function, passing in an empty string (aka list) as the “accumulator”.

where sac' acc [] = [acc]

If sac' sees an empty string ([]) – the end of the string currently being processed – return the accumulated string in its own list.

Otherwise, take apart the current string into its first character, x, and the remainder, xs. If that first character is a colon, return a list with the current accumulator as the head, and recurse to process the rest of the string (and so on). Otherwise, add the non-colon character to the current accumulator, and recurse to process the rest of the string.

First reactions

Moral of my first story: prepare to be humbled. Google and IRC were a lifeline, and the people on #haskell, both helpful and patient. More soon.

In a recent entry on differences between Haskell and Lisp, one of the Lisp community’s long-time members, Daniel Weinreb, asked about my stated aversion to JVM-based languages for everyday computing (some times referred to as “scripting”). Specifically, it was asked in relation to Clojure, and why I hasn’t been immediately taken by that languages – despite it’s having so many features I respect and admire.

I wanted to respond to Daniel’s question in a separate blog entry, since this topic has come up so often, it seems, and deserves thought. The JVM is a rich, mature platform, and you get so much for free by designing new languages on top of it. The point of debate is: what are the costs, and are they always worth the asking price?

Daniel’s question was:

In your own case, you mention “tiny” and “fast-running” executables. I am not sure why “tiny” matters these days: disk space is very cheap, and the byte code used by the JVM is compact. Common Lisp programs compiled with one of the major implementations, and programs written for the Java Virtual Machine, execute at very high speed.

The fact that you distinguish between server-side and client-side applications suggests to me that what you’re really talking about is start-up latency: you’re saying that a very small program written for the JVM nevertheless has a significant fixed overhead that causes perceived latency to the user. Is that what you have in mind?[…]

As a hypothetical question just to clarify your meaning: if there were a JVM implementation that started up instantly, so that the speed of execution of a small program would be the same as the speed of the same code appearing in the middle of a long-running server process, would that answer your objections?

Hi Daniel, thank you for your in-depth reply. As always, I enjoy reading what you’ve contributed to the Net’s compendium of thought on Lisp and related languages.

Your clarification was most accurate: When I said “scripting”, I was talking about a context of usage, not a particular language paradigm. I like that Haskell seems to be just as appropriate for tiny, throw-away scripts as it is for large, long-running programs.

When it comes to the latter, I really no have objections at all to the JVM or its startup time. I’m more than willing to wait 5 minutes for something to execute, if it will run for months at high efficiency. I face this situation all the time at work, where we have a huge EJB application hosted on JBoss. It may complicate debugging sometimes, but the costs are worth the benefits. The sheer number of things that J2EE and JBoss manage on our behalf, compared the small amount of code necessary to take advantage of them, is quite amazing.

What the JVM takes away, at least in 2009, is the choice of what those costs will be, and when I have to pay them. I think one of C’s biggest attractions for a long time has been that most of its costs are a conscious decision. If you favor startup time, or a small memory footprint, or fast execution, you can pretty much decide. This makes it as appropriate for embedded apps, as it is for running an HTTP server, as it is for building operating systems and compilers. With Java, despite all the things you get for “free”, it comes at the cost of other freedoms. And sometimes, Java’s priorities are not mine.

So while I can and do use the JVM for server-side computation, it’s a bit heavy weight for small and simple tasks. Common Lisp’s answer to this problem was an ingenious one. Instead of building programs that you run over and over, it offers an “environment” in which code is iteratively evaluated, so that you actually grow and nurture a burgeoning set of functionality within a long-running VM. I like this model when appropriate, and enjoy it, for example, in Emacs, which I can leave running for days on end while at the same time extending its functionality by writing new functions and customizing variables.

To answer your query then: yes, if JVM startup time could be eliminated, it would “free my hand”. I very much respect the maturity and stability of the JVM libraries Groovy and Clojure have access to. Also, what you said about the JIT, and alternative VMs, can be supplemented by mentioning all the other JVM facilities that exist, like code coverage, performance and memory analysis, and live introspection; along with the ability to pick JVMs to run on phones, or satisfy real-time computing requirements. It’s a rich platform, no doubt.

But why do we never see complaints about languages that link to the C++ standard library, or Boost, or any other of the large frameworks that exist? Because in those worlds, you don’t pay for what you don’t use. It’s been a design philosophy behind C++ for years, and to good effect. We might complain about the language, or its APIs, but you hardly notice if other projects use it, because largely, one can pretend it’s not even there. Not so with the JVM. Every time I start a Java application on my system, I feel it. Run several of them at once, and even my 3Gb laptop starts swapping. Only with the JVM are such things a source of common complaint.

I’m hoping that some day, projects like the LLVM will start to abstract these two sides of development. I want to be able to pick my language for its type safety, clarity, expressiveness, and joy of use; while at the same time I’d like to pick my VM for its security, footprint, handling of parallelism and messaging, and run-time appropriateness. This would let me choose Lisp, Haskell, Python or C++, depending on the skillset of engineers available to me; and the JVM, .NET platform, or LLVM, depending on how I meant the code to be used. Wouldn’t that be a powerful set of tools at one’s disposal?

Someone recently asked what my issue was regarding the JVM, since at the moment it prevents me from falling too much in love with Clojure – a language with the double-benefits of functional programming, and Lisp syntax and macros.

Well, below is my reason. These may not seem like much time in the scheme of things, but psychologically it builds up on me when I have to run a particular script over and over and over again. I’ve already noticed the pain with Groovy.

As some one who has enjoyed the Lisp language (in several flavors) for about 15 years now, I wanted to express some of my reactions at recently discovering Haskell, and why it has supplanted Lisp as the apple of my eye. Perhaps it will encourage others to explore this strange, wonderful world, where it looks like some pretty damn cool ideas are starting to peek over the horizon.

First, let me say that unlike many posts on the Lisp subject, I have nothing negative to report here. It’s not that I haven’t had my share of ups and downs with Lisp, but that if you want to know about those, look around. Most of what other bloggers have to say is dead on, so there’s little need to repeat it here.

I’ll just address some of the cooler aspects of Lisp, and how Haskell compares in response.

Elegant syntax

While many dislike Lisp’s abundant parentheses, I fell in love with them. Perhaps it’s because I spend so much of my time working on compilers, and Lisp programs read like their parse trees. This “code/data equivalence” is beautiful. It makes it trivial to write DSLs, for example, since you all you need to do is model the syntax tree as a series of Lisp data structures, and then evaluate them directly. It removes the need for an intermediate parse-tree representation.

When I first approached Haskell, I was shocked at the amount of syntax I saw. Operators abounded – more even than C – like: ->, =>, ::, $, $!, etc, etc. The more I looked, the more operators there seemed to be, until I began feel as lost as when I read Perl code.

What I didn’t realize is that in Haskell, much of the syntax you see are just special function names. There is very little “true” syntax going on; the rest is built on top of a highly expressive core. Lisp looks clean because nearly all its operators are used like functions. Haskell goes for an “infix optional” style, which allows you to call anything as either prefix or infix, provided you quality the function name correctly:

Nothing can much Lisp’s rigorous purity, but once you see past the sugary veils, Haskell is pretty basic underneath as well. Almost everything, for both languages, boils down to calling functions.

Macros

Another beauty of Lisp is its macro facility. I’ve not seen its like in any other language. Because the forms of code and data are equivalent, Lisps macro are not just text substitution, they allow you to modify code structure at compile-time. It’s like having a compiler construction kit as part of the core language, using types and routines identical to what you use in the runtime environment. Compare this to a language like C++, where, despite the power of its template meta-language, it employs such a radically different set of tools from the core language that even seasoned C++ programmers often have little hope of understanding it.

But why is all this necessary? Why do I need to be able to perform compile-time substitutions with a macro, when I can do the same things at runtime with a function? It comes down to evaluation: Before a function is called in Lisp, each of its arguments must be evaluated to yield a concrete value. In fact, it requires that they be evaluated in order2 before the function is ever called.

Say I wanted to write a function called doif, which evaluates its second argument only if the first argument evaluates to true. In Lisp this requires a macro, because an ordinary function call would evaluate that argument in either case:

What about Haskell? Does it have a super-cool macro system too? It turns out it doesn’t need to. In fact, much of the coolness of Haskell is that you get so many things for free, as a result of its design. The lack of needing macros is one of those:

doif x y = if x then (Just y) else Nothing

Because Haskell never evaluates anything unless you use it, there’s no need to distinguish between macros and functions.

Closures

The next amazing thing Lisp taught me about was closures. Closures are function objects which retain information from the scope they were constructed in. Here’s a trivial example:

In calling foo, I’ve created a function object which adds two numbers: the number that was originally passed to foo, plus whatever number get passed to that closure in turn. Now, I could go on and on about the possibilities of this mechanism, but suffice it to say it can solve some really difficult problems in simple ways. It’s deceptively simple, in fact.

Does Haskell have all this closurey goodness? You bet it does, in spades.

In fact, Haskell even one-ups Lisp by making partial application something as natural to use as an ordinary function call:

foo = (+)
bar = foo 10
bar 20
=> 30

This code doesn’t just make foo an alias for add, which I could have done in Lisp as well. It says that foo returns a function object expecting two arguments. Then that bar assigns one of those arguments, returning a closure which references the 10 and expects another argument. The final call provides the 20 to this closure and sets up the addition. The fact I’m evaluating it in the interpreter loop causes Haskell to perform the addition and show me the result.

This combination of lazy evaluation with partial application leads to expressive capabilities I’ve frankly never experienced before. Sometimes it causes my head to spin a bit.

Parallelism

One thing about Common Lisp is that it harkens back to a day when computers were much simpler – before multi-threading, and multiple processor machines were both cheap and common. Since it was designed at a time when there was One Processor to Rule them All, it didn’t go to great lengths to consider how its design might effect the needs of parallelism.

Let’s take function argument evaluation, as a simple example. Because a function call in Lisp must evaluate all arguments, in order, function calls cannot be parallelized. Even if the arguments could have been computed in parallel, there’s no way to know for sure that the evaluation of one argument doesn’t cause a side-effect which might interfere with another argument’s evaluation. It forces Lisp’s hand into doing everything in the exact sequence laid down by the programmer.

This isn’t to say that things couldn’t happen on multiple threads, just that Lisp itself can’t decide when it’s appropriate to do so. Parallelizing code in Lisp requires that the programmer explicitly demarcate boundaries between threads, and that he use global locks to avoid out-of-order side-effects.

With Haskell, the whole game is changed. Functions aren’t allowed to have side-effects, and their value is not computed until needed. These two design decisions lead to situations like the following: Say I’ve just called a function and passed it a bunch of arguments which are expensive to compute. None of these operations need to be done in sequence, because none of them depend on the others for their value. If then I do something in my function which needs some of those values, Haskell can start computing the ones it needs in parallel, waiting on the completion of the whole set before returning the final result. This is a decision the language itself can make, as a by-product of its design.

Community

Lastly, the Haskell community is amazing. Newbies, you are welcome here. Their IRC channel is both a friendly and knowledgable place, where newcomers are cherished and developed.

Likewise, the web resources and books I’ve read about Haskell so far have all been top-notch. You get the feeling people are fascinated by the language, and eager to share their joy with others. What a refreshing change. Lisp may have a rich history, but I think Haskell is the one with the future.

Following on my last entry, where I built a better pre-commit hook to ensure every commit in my Git history passed make check with flying colors, I realized something today about my good friend, rebase.

git rebase is a brilliant tool for editing history you haven’t pushed yet. If I have a set of ten commits, and realize the 3rd commit has an oversight I’d like to smooth out, I can make a new commit to fix the problem and then merge it with that old commit, resulting in a nice, clean set of commits for pushing.

However, using rebase at any time invalidates the work of my pre-commit hook. Why? Because any time I use rebase, I throw away whatever I’ve confirmed in the past about those commits. A rewritten commit is really a new commit, which means it hasn’t been tested, may never have existed as a working tree, and certainly isn’t the same as the previous commit, however similar its diff output may be.

What this goes to show is that immutability is a requirement of sane integration. Not only does code go into a commit, plus a date and a description, but also the work that went into verifying that commit. All of these details are bound up in the immutable state of that commit. If the state is change, and the immutability guarantee broken, all bets are off.

Thus the only way I could use rebase in confidence would be to run the same pre-commit again on every changed commit during the rebase operation – which is something Git doesn’t do. It thinks that if you rewrite a series of commits, the final HEAD will have the same contents as the previous HEAD, which is true. But the rebased commits leading up to that head, especially if their order was changed, now represent a fictitious history behind that new HEAD.