2012 February 24

Data Structure Agnostic JSON Serialization

Recently, Johan Tibell wrote a post on serialization APIs in Haskell, and thought it might be good to mention the approach used in my own json-builder, which I hadn’t previously promoted to very many others.

In the post, Johan highlighted the Value data structure mandated by the popular aeson package, and had a little aside:

Aside: even though we’ll serialize Haskell values to and from this type, it would have been reasonable, although perhaps more cumbersome for the users of our API, to skip the Value type entirely and convert our Haskell values directly to and from JSON-encoded ByteStrings.

Skipping this data structure is exactly what json-builder does. It takes arbitrary data directly to a json string. It’s also efficient, capable of serializing aeson’s data structure with identical performance as aeson itself. It’s also a robust abstraction, meaning that all uses of the basic interface will result in syntactically correct json strings. And, json-builder just as easy to use as aeson’s ToJSON typeclass.

Unfortunately my library does not solve the problem of parsing and processing Json values; there is no analog of the FromJSON typeclass, though I am interested in how one might implement similarly data structure agnostic json parsing.

The basic idea looks exactly the same as ToJSON class, though its currently named Value instead:

class Value a where
toJson :: a -> Json

Now, Json is an opaque type, which is why this is a robust abstraction. All you can do with it is to turn it into a string, and use it to build bigger Json values.(In fact, if you look inside the Internal module, you’ll learn that Json is just a newtype wrapper around blaze-builder.)

Now, to take the example that Johan uses, let’s say you want to provide Value instance that serializes a Person record into a Json Object. Now, there are serialization instances for Data.Map and Data.HashMap, so you could take Aeson’s approach and build one of those first. Or you could circumvent the abstraction and produce Builders yourself. But what you really want to do is use the JsObject type class:

This code is identical (modulo renaming) to the code that Johan gave to turn a Person into an Aeson structure. What it does is construct an Object value, which is an opaque type that builds Json Object. Object values have a very simple API. It only provides a singleton constructor and a Monoid instance. And you can turn an Object value into a Json value, of course:

instance Value Person where
toJson p = toJson (toObject p)

Unlike aeson, you have full control the order in which the object’s fields appear. Unfortunately, json-builder will also happily produce JSON objects with duplicate field names, whereas aeson ensures that field names are unique. Neither issue is likely to be a very big deal in practice.

Now, json-builder has a couple of potentially interesting advantages over aeson. Let’s look at the serialization code for Haskell lists:

Unlike Aeson’s list serialization, this is a good consumer, and thus can fuse with good producers. So for example, when compiled with optimization, toJson [1..10^9] shouldn’t create a list at all, but rather directly produce a Json list of integers.

Also, this code is incremental even if it doesn’t fuse. It doesn’t need the entire list to start producing the Json string. Aeson, by contrast, marshals the entire list into a Vector before it produces anything.

Whether or not either of these advantages mean much to real world applications remain to be seen. I would guess that for most such applications, the structure-agnostic aspects are a bigger win.

This generality doesn’t cost anything over aeson in either serialization speed or ease of use; for example, here’s an instance for Aeson’s data structure:

This turns out to be almost exactly equal in performance as the serialization code in aeson. (and perhaps I should add an instance for vector to json-builder) Take note that you don’t need to use such a simple recursion in either aeson or json-builder. You can easily tweak the serialization of any part of a data structure by calling something other than toJson. For example, say you have a map of Maybe values, and you don’t want to include keys associated with Nothing. (These would normally be rendered as null.) Then you can use this code:

Like this:

Related

What you’re doing is basically making aeson’s Value type virtual. Just like you can allow users to “create” values of that virtual type by defining functions analogous to each constructor, you can allow them to “consume” a value that type by offering a fold operator for it:

Actually, I’ve played around with a similar idea; it just doesn’t have the interface I want. I wouldn’t consider it a convenient way to express deserializers, at least, not without something sitting on top of it hiding all the arcana. But maybe you know how to do that?