Wednesday, March 28, 2007

I've been working hard on Catch for the last few months, and will hopefully have something to release to the world very soon - a few people have alpha copies already. As part of that I've been playing with Yhc Core lots and lots. At once stage I introduced a bug into Yhc, by accident, due to the difference between let and letrec, so I thought I'd describe my mistake for the whole world to see.

Haskell has a keyword called let, but some functional languages have both let and letrec - Haskell's form (although called let) is actually letrec. What's the difference? Consider the fragment of code:

let x = 1 in (let x = x in x)

If we are dealing with let, the answer is 1. If we are dealing with letrec, the answer is a black hole of non-termination - _|_. The rule is that in a letrec you introduce the names you are defining before computing their values (so the x = x refers to itself), but with let you only introduce the names once you get to in (so the x = 1 is still in scope).

Often people define let's in a non-recursive way, but occasionally the recursion is fundamental:

repeat xs = let res = x : res in res

Here repeat is being defined with letrec - the res on the LHS is used on the RHS. This enables only one (:) cell to ever be created, saving memory.

Yhc.Core has let, which is recursive - unlike GHC Core which has both let and letrec. Yhc.Core also provides a method "removeRecursiveLet" which removes all recursive lets, replacing them with equivalent code.

So now we come to the bug, I recently changed Yhc.Core so that when desugaring FatBar/Failure (from complex pattern matches) we introduce let bindings, instead of inlining the let multiple times. This can lead to a substantial reduction in code size for bigger programs. The basic idea is that before when a Failure was spotted, we replaced it with an expression, now we replace it with a variable which we bind to the expression. If there are multiple failures with the same expression, we gain with less code.

As you might be able to see, there are two vfail's here which are part of a letrec, when I really was hoping for a let. Adding a simple numbering scheme to the v_fails mades these refer to different ones and the problem was solved.

Wednesday, March 21, 2007

I was reading JFP this weekend, and read the paper "Spreadsheet functional programming" by David Wakeling. As a Windows user who has done a lot of work with Excel, I thought it was great. People before have either tried to pervert Excel into a functional language, or write a new spreadsheet - and given that Open Office spreadsheet sucks totally - this seems to be a hard challenge.

Despite being a lovely concept, I thought the implementation described missed the mark in a few places. These are not academic reasons, but reasons of practicality for a production system and integration with the underlying Excel. I fully expect that the reasons stem from missing a deep understanding of Excel - something few academics would be likely to master. However, my Dad is a Chartered Accountant, and I've been developing systems based around Excel for over 10 years, so can fill in the Excel half.

Spreadsheet Functional Programming

Before describing how I would implement the ideas in the above paper, its best to recap the paper a bit, for those people who can't be bothered to read it. (Note: the above paper is very accessible, even a novice programmer should be able to get something from it - so skip this summary and read the paper!)

The basic idea is that Excel should be able to call Haskell, so:

=Haskell("1+2")

should call off to Haskell and evaluate 1+2, hopefully returning 3. Naturally that sort of thing could be done in Excel much more conveniently, so the strength should be in definition of proper Haskell functions within Excel:

f :: Int -> Int -> Intf a b = (a + b) / 2

=Haskell("f 1 2")

(imagine a more complex example for f, I'm too lazy to think of one, but the paper has one)

Obviously, =Haskell should go in the Cell you want the result to appear in, but what about the definition of f? This paper chooses to place definitions of f in comments, a neat solution that keeps them in the document, but out of the small cells, which would be inappropriate.

This is extended by allowing users to write references such as A1 in their Haskell strings. Unfortunately this shows up a weakness - now dragging formula around does not automatically update their references as it normally does in Excel, since the reference is hidden from Excel inside a string.

The final point to mention is that the evaluation is done with Hugs, by writing out a static file and running it - something I believe could end up being too slow for practical use. Whether it is or not isn't something I can comment on, having not tried it in real life.

HsExcel

Reading the paper, there are lots of things which are done spot on, and seem obvious in retrospect. The best ideas always seem good in retrospect, so I consider this a very good thing. Unfortunately I see 3 issues:

Writing code in comments looks highly unpleasant

Haskell is in strings, which enforces separation between Haskell and Excel

Not fast enough

I have come up with a design which I believe addresses all of these issues, which I have named HsExcel. I have not implemented it, so cannot guarantee it would really work, but I suspect it would not be that much of a challenge. I have no free time currently, but it is on my todo list (somewhere).

The first issue I shall tackle is writing code in comments. A much nicer solution would be to add a tab to Excel which contained Haskell code in a nice big box, perhaps also allowing Haskell code in comments if people found it preferable for small snippets. This can be done using Excel dialog panes with a bit of VB magic.

Next is the issue of Haskell in strings, instead of =Haskell("f 1 2") I would write:

=hs("f",1,2)

Here the Excel will evaluate 1 and 2, and pass them to the Haskell function f. If 1 was A1, then it would behave like a normal A1 in Excel. Perhaps allowing =Haskell as well would be nice, but I think the common case would be =hs. This also allows better flexibility, nested Excel and Haskell functions together, intermingled. I also think this will make cell dependency updating work (an issue not covered in the paper).

Once this has been done, Haskell could operate on any Excel values, including Cell Ranges etc. Unfortunately the paper is silent on the issue of marshaling Excel data to Haskell, but I suspect it could be done relatively easily.

One nice feature to add would be that =hs_f(1,2) works, just like an Excel function. I'm not sure if VBA is permitted to add primitive functions to Excel, or just addons. I'm also not sure if addons can dynamically change the available functions. If this was the case, then all entires in the Haskell source could become first class Excel functions.

The final question is the issue of speed. Currently the implementation runs the string from the beginning each time, writing out fresh values for the changed cells. I think spawning a shell and talking to an interpreter with each request would be higher performance. Once this solution was adopted, either GHCi or Hugs could be used, potentially allowing for a further speed up. The author comments that this is hard to do using VB's Shell command, and indeed it is. However, it is possible to prod standard Win32 system calls from VB, albeit with a great deal of work, which could get this working properly.

Conclusion

The paper presented a lot of good idea, I think a bit of user interface polish could bring them to the fore. HsExcel could be a useful tool - I certainly have projects I could use it for. I suspect that it would be under a weeks work for anyone already fluent in Haskell and (more importantly) Excel/VBA. Any volunteers?

Saturday, March 10, 2007

Yesterday I was asked a question about functional programming from someone in my department. This person is in the real time systems group, and their programming language is Ada (although they are not really a programmer).

The question I was asked was about cache locality, but that question was just a red herring inspired by a complete misunderstanding about what functional programming was. The impression they had was that it was just Ada without procedures. So I tried to explain "in a nutshell" what functional programming was. It's not an easy task!

I didn't go on about things like purity and referential transparency - these concepts might mean something to an experienced software engineer (they can see how it would make nicer interfaces) or a mathematician (more like maths), but to someone who does timing analysis, it doesn't.

So I wanted to try and get across a few of the core concepts:

1) No assignment (technically no destructive assignment, but that is the only type they know). The example they wanted me to show them "in a functional style" was:

x := 1;x := x + 1;

In a functional style there are a couple of ways to do this:

x = 1y = x + 1

Of course, order is irrelevant

y = x + 1x = 1

Neither really capture the idea they were looking for. However, in a purely functional style we don't write silly stuff, so thanks to the purity we can just write:

y = 2

The great thing is that we can know that the answer is 2, because beta reduction is valid.

So after (probably failing) to dismiss destructive assignment, I moved on:

2) No sequencing, no "one thing then the other". This was again quite hard to get across. I gave the example of quicksort, showing that sequence was not necessary.

I then challenged them to write quicksort in Ada, which is really quite hard to do. Obviously to do a quicksort only on Integer takes loads of lines before you even get to the sorting bit. Since I had been teaching generics earlier, this showed how superior Haskell is for generics.

I think I was quite a bit less successful at explaining this one. The only way you can show sequence is not necessary is by writing an OS, a compiler, a web server etc in Haskell - all of which have been done. Unfortunately I only had a small piece of paper so couldn't write any of those.

3) Then we got back to the question of timing analysis - if you spend all day doing timing analysis you start to believe that everything else is less important. I got asked if my PhD in "software engineering" was just writing programs. I took great pains to explain that programming language research is not writing an interface to a database in Java.

Anyway, if you want to spend all day doing timing analysis, Haskell is probably not the right language to pick :) So at this point I decided to screw with their mind:

I asked them to predict how long this would take:

sort "neil"

They guessed O(n log n), I said O(1). If you never need the result of sort, it never gets executed - take the example:

if null (sort "neil") then 1 else 0

The idea of lazy evaluation was starting to mess with their head, so they decided to try and rephrase the question. They are interested in the worst case time, so can you treat it as the time to fully evaluate all the components.

The answer is a definite no!

[1..]

The above expression takes infinite amount of time to evaluate

[1..] !! 3

Therefore under their interpretation, this takes an infinite amount of time, but it doesn't. It's much quicker than that.

I don't think they understand that many of the concepts about functional programming, or how to use it in practice, but I think they do now realise it is not just "another programming language" - its an entirely separate entity.