Thursday, March 22, 2007

I'm going to be giving a talk on Haskell for FringeDC, the Washington, DC area group for programmers who like to live on the edge. ;-)

The talk is called Haskell: A Cook's Tour. It's really hard to condense what's so cool about Haskell into an hour or two. (I did a 3 hour tutorial at OSCon last year, and boy, did that feel rushed! And that just scratched the surface!) Instead, I'm going to try my best to hit the high points: the type system, monads, parsec, and one or two more gems.

This just goes to show that there is simply too much code necessary to solve this simple problem in an imperative fashion. It's not an indication that the code is poorly written, but rather, that imperative techniques are a bad fit because they obscure the overall intent.

The nested for loop isn't the goal. The goal is finding the number of employees who have been in the company longer than their department (i.e. "old timers"). As I read this code, I kept re-reading it because I thought I found a bug. I thought the code was counting all employees who have been in the company longer than any department, but that was my misreading. I kept glossing over this clause, which focuses the search to employees who have been with the company longer than their own department:

emp.getDepartmentId() == dept.getId()

Sadly, it's too easy to fall into this trap. There's a lot of stuff to read in this small block of code, and I certainly found it easy to gloss over an important detail. Undoubtedly, the opposite problem is also common: neglecting an important detail in the middle of a series of nested loops; finding those bugs also takes skill and effort to identify and fix.

Reg does a good job explaining why the goal isn't a nested for loop, and why using functional programming techniques (like generating a cartesian product, and filtering the result) bring the code closer to the goal -- counting the number of old timers.

Reg makes the point that what really matters are the functional techniques, not the language or the syntax used. Here's his revised solution, in Ruby (with an extension to support cartesian products on lazy lists):

Generating cartesian products just might be one of the fundamental ideas of computing. In an imperative language, they usually take the form of nested loops. In a language that supports list comprehensions (like Haskell or Python) it becomes a basic idea that melts away with just a little syntax:

[(x, y) | x

Not only are the products computed as a list, but in Haskell, they're computed lazily, so you only compute what you need.

Here's a revised solution to the old timers problem using list comprehensions:

Note that the actual values returned in this list are inconsequential; the only thing that matters is counting the number of matches. Using length on this list comprehension highlights the fact that the goal here is to produce a count of matching elements. This intention is muddled within the nested for loops, and isn't precisely clear within the Ruby version.

What are the fundamental ideas in computing? The most basic ideas must be boolean algebra and binary numbers. From there, it's a short hop to ASCII and maybe Unicode, and then to arrays. I'm not sure what else falls into the category of 'fundamental ideas of computing', but sorting and searching must be in there somewhere.

I was thinking about what would make the list, and I couldn't come up with a complete list of topics, and two of those topics -- sorting and searching -- are a kind of black magic. Quicksort is a marvelously simple algorithm, but it's so easy to get wrong that the standard approach is to use a library implementation, or, in a worst case scenario, copy code out of a good textbook (ideally Knuth). Writing a quicksort function from memory is almost certainly a way to introduce bugs into a program. Searching a sorted binary tree is less error prone, but still one of those algorithms that's best to reuse or copy instead of writing from memory.

Why is this? Why are there fundamental ideas in computing that are best reused from a library or copied out of a textbook?

I'm not a chemist, nor am I a physicist, but I still remember a great deal from my high school chemistry and physics classes. Certain parts of the periodic table are deeply imprinted in my brain, as are Newton's laws of motion (and the calculus that relates position to velocity to acceleration). Yet there is something so complicated about quicksort and binary search that it's best not to write it down on a cocktail napkin.

Something is wrong here. The basic ideas in a profession should be so, um, basic that any professional should be able to recite them from memory on demand.

The problem isn't so much that these ideas are so arcane that they should come from a textbook whenever they are used. The problem is that our expression of these simple ideas is overly complex.

Here is a simple expression of C.A.R. Hoare's quicksort that's simple and easy to understand (optimized for clarity):

The problem with quicksort isn't that it's a fundamentally difficult algorithm to understand. As expressed here, the algorithm is quite simple (and fits on a cocktail napkin). As it is typically expressed in a language like C, it is optimized to have an additional property: it sorts the array in place. The typical Haskell version doesn't offer this property, but at least it highlights the algorithm, not the mechanics needed to support the typical optimization.

Another classic algorithm is a search within a sorted binary tree. Binary trees are another foundation in computing that are frequently more complicated than they should be. Here is a definition that is simple, obvious and easy to remember:

data Tree a = Empty | Node (Tree a) a (Tree a)

That is, a tree is a node with a value and two sub-trees, or it is an empty node. Let's assume that the contents of a tree are always in sorted order. That is, the left subtree contains nodes that are strictly smaller than the current node, and the right subtree contains nodes that are strictly greater than (or equal to) the current node.

Performing a binary search on this structure should be simple, and it is:

Snippets like this are one of the reasons why I like Haskell so much. As I was driving around today, I wondered how I would write a binary search, when I visualized these two snippets of Haskell pretty much as you see them here. This is certainly something I would never do if I were still thinking in C, because the pointer manipulation and memory management issues obscure the intent behind a binary search as to render it unrecognizable.

Now I'm wondering how many fundamental, yet "complex and error prone" concepts can be expressed with a screenful of Haskell.

Another thing I like is that people seemed to really be using Python to do useful things. There wasn't nearly so much time spent talking about Why Python Is Good, or How To Sell Python To Your Boss. That's kind of silly anyway -- if you are at the conference, you don't need to be sold on it. But even so, the language seems to be in a much more self-confident place, and the users don't need to spend so much time justifying their choices.

Ian's comment reminds me of what the bad old days were like ten years ago. Java was the new kid on the block, had a huge marketing budget, and dominated the discussion on new development. Java was a net positive, because it displaced C, C++ (and, to some degree, VisualBasic) on many projects that simply needed a better language. At the same time, it was a huge bunch of hype, and it focused attention away from perfectly capable languages like Lisp, Perl, Python and Tcl.

As a result, there was a movement for 'language advocacy' to convince people that $YOUR_PET_LANGUAGE was demonstrably better than Java in at least some areas. A well-balanced argument might mention that Java was better in some areas, but in practice, those areas weren't particularly important. On a bad day, it was about promoting $YOUR_PET_LANGUAGE over Java and just about everything else.

I haven't seen much language advocacy recently, and I certainly haven't seen much of it in relation to Haskell (or its cousin, OCaml). I, too, think this is a good thing. I'm not sure why it's happening, but I've got some theories. Perhaps it's because I've grown as a programmer, and I avoid those arguments, so I don't notice them anymore. Perhaps it's because 'language advocacy' has gone out of vogue. I have a feeling it's the latter, but I can't prove anything.

Ian certainly brings up an important point. Being successful, building systems, and being confident your tools are capable certainly makes a powerful case. Convincing others that your tools are better, or at least worthy of consideration is a losing position. Some teams are happy and wildly productive with Java (or C, or C++), and it doesn't make sense to switch languages because they might offer something nebulous like more productivity or more joy. Some teams are very risk averse, so sticking to something mainstream like Java or C# makes sense, since there's little doubt someone will be able work on the system five to ten years from now.

Then there's the Paul Graham effect. In his essay Beating the Averages, Paul postulates that some languages are simply more powerful than others, and demonstrates that a good team can always manage to build a system, regardless of the tools they choose.

Since Paul released that essay, Java and C# have also evolved. So maybe there's an expectation of tool choice -- not only whether to use Java or C#, but also which version of the VM to use, or whether it's wiser to avoid those issues entirely and opt for a stable platform like Python or Ruby.

Tuesday, March 6, 2007

When for loops (and while loops) are the only tools at your disposal, every iteration construct is just an instance of a loop. When you have higher order functions and closures, then you start seeing different kinds of iteration constructs: maps, folds and filters for example. As an added benefit, the higher order approach also also eliminates the scaffolding that leads to bugs.

The problem is, that's not the entire story.

The difference between the for loop and the higher order functions is the same as the difference between push and pull templates in XSLT1.

Push templates are more natural, and easier to write in XSLT. They work with data, instead of against it:

XSLT is really just a Scheme dialect that uses an event based virtual machine and an XML syntax. When the XSLT engine finds an element in the XML source document (an XML open tag, a chunk of text, a comment, whatever), it looks for a template rule to determine how to output that element. The template above matches <para> elements in a source document, and replaces them with <p> elements in the output document. The body of the <p> element on output is handled by the <xsl:apply-templates/> instruction, highlighted above. This instruction tells the XSLT engine to start examining the children of the current <para> element in the source document, and look for templates that describe how they are to be formatted in the output document, and place those outputs within a <p>...</p> block.

This leads me to one of my favorite XSLT techniques -- the null template:

<xsl:template match="double-secret-probation"/>

This template doesn't contain any content. Specifically, it doesn't contain an <xsl:apply-templates/> instruction. Therefore, whenever a <double-secret-probation> element is found in a source document that triggers this template, it and all of its children will be pruned from the output document.

Pull templates are the alternative to push templates. Instead of letting data flow through the stylesheet, they pick data out of the source document, and place it directly into the output document:

This template actively seeks out <para> elements that are children of a <body>, replaces them with <p> tags (as above), and then hunts down the text contents of each <para> element, shoving it directly into the <p>...</p> blocks. The two constructs above, <xsl:for-each> and <xsl:value-of>, are used to pull content out of the input document and into the output document.

If it sounds complicated, it is. XSLT makes it easy to reformat XML documents by providing an implicit processing model that visits every node of an XML document. Input can be rearranged, replaced, or removed in the output. Pull templates, on the other hand, can usually perform the same transformations, except that it generally takes more work to write (and understand) to do it.

In general, it's much better for XSLT stylesheets to use push templates whenever possible, and save pull templates for those rare situations when they are absolutely necessary.

As an exercise for the reader, write a stylesheet that emits this fragment and preserves all of the formatting, but eliminates the chunks labeled <double-secret-probation>:

<para> The 11<sup>th</sup> Annual Homecoming Parade will take place Thursday at <time>1100 PST</time>.

<double-secret-probation>Under no circumstances will Delta House participate.</double-secret-probation><para>

(Hint: the two push templates above will be key parts of the solution. I don't even want to think about how to write a solution using a pull template....)

A few years ago, I taught on-site XSLT training courses. They went from the basic (this is a stylesheet, this is a template) all the way through advanced techniques (how to add commas between elements in a list, but not after the final element). I found that students whose exposure to XSLT started with "view source" always started with pull templates, and translated algorithms out of a Java program into XSLT syntax. It took a while to explain push templates, and appreciate how they simplified the problem domain. (Having copious examples helped, including a few that were nearly impossible with pull templates, but trivial with push templates.)

This is because loops and similar "pull techniques" are some kind of deep knowledge that many of us learn as beginning programmers. It was the only realistic approach when the only available language was some form of assembly, and it propagated through Fortran and ALGOL into every imperative language still in use today.

We can do better now. Computers are plenty fast to handle whatever meager overhead is necessary to apply functional idioms like map, fold/reduce and filter to transform and reduce aggregate values, whether we use those idioms in imperative languages like Perl, Python or Ruby, or in a purely functional language like Haskell.

The good news is that once you understand how to push data through a computation, trivial operations become, well, trivial. Here's the sum of all the odd numbers from 1 to 100:

sum (filter odd [1..100])

Perhaps this is why imperative programmers hesitate to use functional languages: pull techniques are so ingrained that it is hard to leave them behind and start using the push techniques that really make programming easier and more expressive...

The two important things to note are creating the input stream, and doing something with the input stream. Everything else is required scaffolding to make error handling work properly. This example is particularly clean, hiding all of the file operations behind a doSomething method. However, the operation could be written inline, so that instead of writing 6 lines of code to get 2 lines of work done, you see 24 lines of code to get 20 lines of work done.

The problem isn't that, in this example, the scaffolding adds 200% more code (as measured in lines, using typical Java style). The problem is that every time you want to open a file (or connect to a database, or ....), you need to write at least 4 extra lines of scaffolding to handle exceptions.

A common response in the functional programming world is to factor out this common scaffolding into reusable code. Pascal describes how to create a with-open-file macro that lets you write this:

This should be familiar if you've seen Lisp macros before, or come across a discussion of (with-open-file ...). However, the real power is in abstracting the required exception scaffolding, not in using macros to achieve that goal.

The power of (with-open-file ...) is undeniable, and has long since leaked out of the Lisp world. It's a common idiom in Ruby, available to all IO objects:

File.open("testing", "w") {|fh| fh.puts "Hi!"}

It's interesting that Ruby offers the same idiom using blocks (er, closures) and not macros. In fact, it's the same technique used in Haskell. You can find it in Control.Exception.bracket:

bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c

As I was writing this post, I went looking through the Haskell Hierarchical Libraries index for a definition of withOpenFile, but I didn't find it. Instead, I found a definition lurking in the documentation for bracket:

Interestingly, bracket is a completely general function that abstracts the idea of wrapping IO operations with an acquire and release action. Due to how errors are handled within the IO monad, processing within the action function terminates at the first error, so no explicit try/finally block is necessary.

It's not absolutely necessary to require a function like withOpenFile appear within the standard library. Instead, you can define as many wrappers as you want, using the precise combination of IO actions you need. For example, you may want to define withInputFile and withOutputFile separately. You may want to use openBinaryFile, openTempFile, openFd, runInteractiveCommand or createPipe instead of openFile. It doesn't really matter; each of these functions are just one-line partial applications of bracket.

Now, I don't want this to be a "Haskell is better than Lisp" (or "Lisp is better than Haskell," etc.) kind of post. Quite the contrary actually. Lisp offers a set of tools to simplify code, and Lispers have taken to use macros to solve this kind of problem. Haskell doesn't have macros, but still enables these 'with...' bracketing patterns, and makes them downright trivial to write, even if you have to write them from scratch. What matters is identifying this kind of repeated scaffolding is a design pattern, giving it a name, and making it easy to use without writing it out longhand each and every time it's needed.