Tuesday, February 13, 2007

A missing Haskell feature?

Haskell 98 + Hierarchical modules is pretty much feature complete as far as I'm concerned. I realise that most people would demand to add higher ranked types and multi-parameter type classes, but the only extension I'm crying out for is pattern guards.

However, there is one operation that I think would be useful - pattern equality.

Consider the data type:

> type RegExp = [Atom]> data Atom = Atom String | Star String

Now imagine you want to check that all atoms in a regular expression are Star's. Easy enough in Haskell:

> all isStar r

Unfortunately the definition of isStar - something that is conceptually very simple - is rather verbose:

> isStar (Star _) = True> isStar _ = False

The reason is that in Haskell pattern matching is used to control program flow, not as an expression. One extension I often find myself wanting is some kind of "pattern match equality". Stealing the ~= operator for the moment, imagine:

> all (~= Star _) r

The ~= operator can be seen as pattern equality. It can be given a very simple semantics:

> x ~= y === case x of {y -> True; _ -> False}

Of course, now you've got pattern matching in there, its natural to allow fuller pattern matching:

> all (~= Star x | null x) r

Note that the original pattern match desugaring still allows this to be valid.

What do people think of this suggestion? Do people think it would be generally useful?

Disclaimer: This is not a carefully thought out feature proposal. Stealing ~= for syntax is completely unacceptable, something like

"HaRP, or Haskell Regular Patterns, is a Haskell extension that extends the normal pattern matching facility with the power of regular expressions. This expressive power is highly useful in a wide range of areas, including text parsing and XML processing. Regular expression patterns in HaRP work over ordinary Haskell lists ([]) of arbitrary type. We have implemented HaRP as a pre-processor to ordinary Haskell."

FWIW, Scala had regular expression patterns for some time, but they have since been removed due to a correctness problem in their implementation, and: "... regular expressions patterns were not so useful for XML processing as we estimated."