However, it is possible to have your data shaped as a DAG (imagine any many-to-many relation), or even as a general graph (OK, maybe not). In this case, I tend to simulate the relational database by storing my data in Maps:

newtype Id a = Id Integer
type Table a = Map (Id a) a

This kind of works, but is unsafe and ugly for multiple reasons:

You are just an Id constructor call away from nonsensical lookups.

On lookup you get Maybe a, but often the database structurally ensures that there is a value.

It is clumsy.

It is hard to ensure referential integrity of your data.

Managing indices (which are very much necessary for performance) and ensuring their integrity is even harder and clumsier.

Is there existing work on overcoming these problems?

It looks like Template Haskell could solve them (as it usually does), but I would like not to reinvent the wheel.

5 Answers
5

The ixset library will help you with this. It's the library that backs the relational part of acid-state, which also handles versioned serialization of your data and/or concurrency guarantees, in case you need it.

The thing about ixset is that it manages "keys" for your data entries automatically.

For your example, one would create one-to-many relationships for your data types like this:

If you don't want to have to update the users of a message or the messages of a user when adding a new user/message, you should instead create an intermediary data type that models the relation between users and messages, just like in SQL (and remove the users and messages fields):

Creating a set of these relations would then let you query for users by messages and messages for users without having to update anything.

The library has a very simple interface considering what it does!

EDIT: Regarding your "costly data that needs to be compared": ixset only compares the fields that you specify in your index (so to find all the messages by a user in the first example, it compares "the whole user").

You regulate which parts of the indexed field it compares by altering the Ord instance. So, if comparing users is costly for you, you can add an userId field and modify the instance Ord User to only compare this field, for example.

This can also be used to solve the chicken-and-egg problem: what if you have an id, but neither a User, nor a Message?

You could then simply create an explicit index for the id, find the user by that id (with userSet @= (12423 :: Id)) and then do the search.

Another radically different approach to representing relational data is used by the database package haskelldb. It doesn't work quite like the types you describe in your example, but it is designed to allow a type-safe interface to SQL queries. It has tools for generating data types from a database schema and vice versa. Data types such as the ones you describe work well if you always want to work with whole rows. But they don't work in situations where you want to optimize your queries by only selecting certain columns. This is where the HaskellDB approach can be useful.

I don't have a complete solution, but I suggest taking a look at the ixset package; it provides a set type with an arbitrary number of indices that lookups can be performed with. (It's intended to be used with acid-state for persistence.)

You do still need to manually maintain a "primary key" for each table, but you could make it significantly easier in a few ways:

Adding a type parameter to Id, so that, for instance, a User contains an Id User rather than just an Id. This ensures you don't mix up Ids for separate types.

Making the Id type abstract, and offering a safe interface to generating new ones in some context (like a State monad that keeps track of the relevant IxSet and the current highest Id).

Writing wrapper functions that let you, for example, supply a User where an Id User is expected in queries, and that enforce invariants (for example, if every Message holds a key to a valid User, it could allow you to look up the corresponding User without handling a Maybe value; the "unsafety" is contained within this helper function).

As an additional note, you don't actually need a tree structure for regular data types to work, since they can represent arbitrary graphs; however, this makes simple operations like updating a user's name impossible.

I've been asked to write an answer using Opaleye. In fact there's not an awful lot to say, as the Opaleye code is fairly standard once you have a database schema. Anyway, here it is, assuming there is a user_table with columns user_id, name and birthdate, and a message_table with columns user_id, time_stamp and content.