HaskellDB: A long tutorial

I’ve been using HaskellDB in production for about two years. I decided that I’d write a proper, up-to-date description, or tutorial, about what it is, how it works, what it can do, and my experience using it in projects.1

ORM approach

Fields

The approach for the object relational mapping is that one defines the column types and entity schemas up front. So, supposing our project is named Caturday, in a module named Caturday.Model.Fields,5 using the field macro,6 one declares fields. For example:

Speed and optimisation

But the subquery is useless in this example, so clearly the optimizer isn’t magic.

λ> ppSqlUnOpt simpleDoubleSelection
SELECT id2 as id,
title2 as title
FROM (SELECT id as id2,
title as title2
FROM content as T1) as T1,
(SELECT id as id1,
title as title1
FROM content as T1) as T2

In fact, subqueries are created in all cases.

For normal query optimizers, e.g. PostgreSQL, the subquery is lifted as to be equivalent to there being one query. I am not sure about MySQL; it may have trouble when joins are involved. Don’t expect good performance from HaskellDB if you’re using MySQL.10

For example, PostgreSQL sees such use of sub-query as equivalent to direct join:

I’m not joining on any indexes so it’s a sequence scan. For people not used to PostgreSQL output, this basically means it will do a cartesian product in both versions.

Maintenance

The great part about HaskellDB is that it is in first-class Haskell land. Fields and tables have a statically enforced membership and field-type schema.

The obvious use case is that it avoids making mistakes in naming and ending up with the wrong field type, or using a field that doesn’t exist in a given table.

The fact that all fields are defined up front with the right type means that one really has to think about how meaningful a type is and how one will use it. For example:

field "Abstract" "abstract" "abstract" [t|Maybe String|]

This is how to encode a database text field that is nullable. When one is encoding their database schema into the Haskell type system, one finds that it really needs to be thought of properly of what types are there in the database, particularly nullability.

In my day to day work, I have to work with database schemas that aren’t mine, I have to interface with them. Due to my use of HaskellDB, I have a lot of correctness questions about these schemas I’m working with to the authors, if they are available for consultation.

Often it comes up, that I ask “why is this field nullable?” and the question often comes back, “I don’t know.” As the PostgreSQL documentation says, in most database designs the majority of columns should be marked not null.11

Note that in Haskell nullability is not implicit. No values can be null. But you can have choice between a value or not a value, as in Maybe:

data Maybe a = Just a | Nothing

And so if we use the abstract field, as mentioned, and use it as a string, it’s not a string, it’s a Maybe String, so we get a compile error such as:

Mismatch: Demo.hs:23:32: “Maybe String” ≠ “String”

Another nice property is that fields named in your codebase, and their names in the database, are entirely separate and configurable. Just because Joe Master Designer chose certain names in his schema, that doesn’t mean that you have to conform to those names. Maybe they call it thetitle, and you just want title:

field "Title" "title" "thetitle" [t|String|]

Another fact is changes to the schema underneath: if someone (you or someone else) changes the type or availability of a field or table in the schema, all you need do is make the necessary change in the field module or table module, and the compiler will tell you immediately which modules need updating with the new invariants.

Suppose we change the type of the field title to Int (for example), when we recompile our examples above, we get:

Extension

Pagination and composing queries

Because the query DSL is a monad (as plenty of Haskell DSLs are), it is really nicely composable. This means it’s trivial to split up queries into discrete parts that have meaningful and generic purposes.

For example, to implement pagination, which is essentially the simple problem of an offset and a count. I implemented this in HaskellDB.Database.Pagination.12

Thus the following implementation is possible. Suppose we write some functions to search the articles by title in the database, but paginated. Two things we need for this are:

Stability

The problem with HaskellDB is that the implementation can be unstable. I found that I had to patch the PostgreSQL library to handle simple stupid things like fields named “user” or “order”, by making sure to quote all fields.

I also had to open up some of the internal parts of the API so that I could extend it further, such as for the operator (.@@.) defined above. I’ll push these fixes and extensions to fork repos at some point.

Reading error messages

HaskellDB gets a lot of stick for hard to read error messages. This is true when you get things badly wrong.

In the general case the errors are quite straight forward.

For example, if I try to use a field which doesn’t exist in the table, like this:

Error: Demo.hs:39:13: No instance for (HasField F.Count RecNil)
arising from a use of `!' at Demo.hs:39:13-27
Possible fix:
add an instance declaration for (HasField F.Count RecNil)
In the first argument of `(.==.)', namely `content ! F.count'
In the second argument of `($)', namely
`content ! F.count .==. val 1'
In a stmt of a 'do' expression:
restrict $ content ! F.count .==. val 1

Which is a very useful error message. content does not has field count.

For getting the wrong type, it merely shows “couldn’t match type A against type B,” straight-forward.

The cases where compile errors blow up are, for example, if I wrote this:

The error actually makes sense if you understand the API well enough, but otherwise it can be very confusing and worrying. Don’t worry about it, you didn’t break something complicated, you just made a typo somewhere. It shows the offending expression; you realised you tried to use a table as a field, and you correct.

Files

Afterwards it would seem like a good idea to get a proper comprehensive tutorial on the HaskellWiki, or much better yet, embed a tutorial in the Haddock documentation for HaskellDB. At the moment the haddock docs are literally just an API listing, with no elaborative explanation or examples. Writing in Haddock mark-up is quite a painful, boring experience. Regardless, I believe the haddock docs of a project should (most of the time) be sufficient to explain its use, linking to external papers and blog posts and whatnot is annoyingly terse and quickly becomes out of date.↩

Embedded domain-specific language. A common notion in Haskell and Lisp languages, though implemented differently in each.↩

This is the convention I have chosen to use. It makes good sense and can be very helpful for all fields used in the project to be defined on a per-project basis, rather than per-entity, and of the same type.↩

A macro that you can get from Database.HaskellDB.TH, which I have yet to put into a library or get added to HaskellDB mainline. I don’t care to debate API decisions with the HaskellDB maintainers right now.↩

A macro that you can get from Database.HaskellDB.TH, which I have yet to put into a library or get added to HaskellDB mainline. I don’t care to debate API decisions with the HaskellDB maintainers right now.↩

When table names conflict with field names—and eventually it happens—this is useful to have. Alternatively as F also makes sense, to be consistent.↩