Menu

Strings, Algebraic Datatypes and Quasi Quotes

Gavin King (of Hibernate fame) has a very interesting design problem. He’s wondering what’s the best design for an API for building queries (i.e. HSQL). I find this interesting because I did something similar in a former life.

You know there a couple of ways to skin this cat, these papers define a taxonomy that serves as a good starting point.

The first way is simply to use a String representation. Although it may seem extremely crude, however this in fact the most flexible solution. The argument against this is that its all too easy to make a mistake, specially if your writing code that generates this string (incidentally this explains why I chose to start from a meta-programming perspective). However, if you choose a String format that makes it conducive to correct compositional construction, then you can avoid this problem. The other concern is that checking is not static, but rather at runtime.

The second mechanism is to use what’s called a “Algebraic Datatype representation”. Essentially, you build a set of classes to represent (in this case the query) the language you’re building. Gavin King actually explores in his piece the different ways you can achieve this. He eventually settles for a style of method chaining (similar to TopLink I might add).

The benefits of this style is that you leverage static checking to build correct structures. However there are two problems, the first is that it doesn’t map to the programmer’s mental model. See the model is about building queries and not building syntax. I guess an earlier piece on building usable API’s tackles this. In essence, the API should represent human mental model and that just happens to be the original query syntax. The second problem is really more of an edge case, however I’ve seen it in my own work. The problem with this style is that it’s not uniform.

What do I mean by not uniform? First of all it’s not side-effect free, the method chaining approach looks syntactically clean but it exploits state. So when you add a new Criteria, the original Criteria changes. The other thing to notice is that it’s not symmetric. The side-effects essentially make composition a bit more difficult. In general, it makes generating these structures from another language a more difficult. The conclusion of this observations is that human usable API’s are actually worse for building code generators (stuff like LISP are wonderful!).

Finally, my recommended approach is the one called “Quasi Quote representation”. It’s more of a hybrid of the two above. Think of it at WYSIWIG with a back door for programatic manipulation. Actually, the original Hibernate API’s employ this approach, however there are still further opportunities of improvement like for example using named groups. Also the Query By Example (QBE) is a variant of this approach.

Let’s learn a bit from history, early Java frameworks for HTML employed the Algebraic approach (remember ECS). Over time, it failed in disfavor over the more natural template (i.e Quasi Quote) approach we see today.