Thursday, December 27, 2007

Since it looks like Haskell web development might start to take off soon, I've been thinking about how to write a good JSON serializer library. The obvious way to start is to define a simple type to represent JSON data structures:

It's easy enough to convert other data structures to the JsonData type, but serializing more complex types takes a lot of boilerplate code. So, maybe the the solution is to create a new class containing a ‘toJson’ function:

Once instances have been added for other basic datatypes, this works nicely enough. However, there are two significant problems. First, since Haskell's String type is just a synonym for [Char], there is no way to handle Strings any differently from normal lists. Second, if you use this approach for serializing simple record types, you end up with a lot of boilerplate instance declarations like the following:

Ideally, we would like to be able to give some sort of default procedure for serializing a record, so that unless we specify otherwise, the record will be converted into a JSON object with appropriate field names. The default should work correctly for nested records, and using it should require at most a single line of code per record type. This ideal can certainly be achieved in a dynamic, reflective language such as Common Lisp, Python or Ruby, but it is absolutely out of the question in Haskell 98. Surprisingly, with the addition of a number of Generic programming extensions to GHC, it is now possible to implement a generic JSON serializer for Haskell too. As you might expect, this involves some pretty hairy manipulation of the type system, so I'd like to set the bait before getting into the details. Here goes. Once we're done, the following code works:

import Json -- Generic Json serialization library (to be written)

data OuterRecord { x :: Int, y :: String, r :: InnerRecord }

data InnerRecord { z :: Int }

$(derive[''OuterRecord])$(derive[''InnerRecord])

> toJson (OuterRecord { x=1, y="foo", r=(InnerRecord { z=7 })})

=> {"x":1,"y":"foo","r":{"z":7}} :: JsonData

If you have the luxury of designing your record types with JSON in mind, you just need to add one such line of code for every type. If you do need to customize serialization of a particular type, it's just a matter of creating an instance:

(In fact, the finished implementation of the serializer provides an easier way of excluding fields from serialization.) The code uses the Scrap Your Boilerplate approach to generics. Although GHC comes with the Data.Generics module, which implements the system described in the first SYB paper, my code is based on the third SYB paper (Scrap Your Boilerplate With Class). For this reason, you will need to install the syb-with-class package from Hackage in order to compile it. The current version (0.3) does not compile on ghc-6.8 out of the box, but only because of a trivial build config issue that's easily fixed (see below).

SYBWC makes it possible to define extensible generic functions. An ordinary generic function operates over instances of GHC's Data.Generics.Basics.Data class. For example, the ‘gshow’ function has the following type:

gshow :: (Data a) => a -> String

‘gshow’ works for the most common Haskell datatypes, but there is no way to customize it for your own datatypes. With SYBWC, generic functions can easily be extended using instance declarations (hence “with class”). Suppose we want to define our own generic show function using SYBWC. We begin, ironically enough, with a small amount of boilerplate code:

import Data.Generics.SYB.WithClass.Basic

class MyShow a where myShow :: a -> String

-- Note that the type of the myShowD field is identical-- to the type of 'myShow' in the MyShow class.data MyShowD a = MyShowD { myShowD :: a -> String }

This is all rather involved, and I refer you to the SYBWC paper for the gorey details. The good thing is that this code follows exactly the same template whatever the type signature of your generic function, so you don't really need to understand any of it.

With the boilerplate out of the way, the rest is pretty straightforward. First, we need to define a generic implementation of myShow that will be called if there isn't a more specific implementation for a given type. Focusing on the instance declaration alone, we need the following:

instance Data JsonD a => MyShow a where ...

Note that the ‘Data’ class in Data.Generics.SYB.WithClass takes two type arguments. This is crucial to the increased flexibility of the SYBWC implementation of generics. In effect, the declaration above adds a new superclass to the Data class, telling GHC that every value that is an instance of ‘Data’ is also an instance of ‘Json’. This is very powerful technique, since it lets us treat any field of a record passed to ‘toJson’ as an instance of ‘Json’ — that's what's going to help us to recursively serialize record types. Defining a good generic show function is a project in itself, so I'll move on from this example and get straight to the JSON serializer. Here's the JSON boilerplate:

Now we have to decide how our generic serialization function is going to work. For record types, it should recursively serialize each value in the record and then create a JSON object with the appropriate field names. For other types, it's less obvious what should be done; I've chosen the following behavior. For algebraic types, the serializer outputs a list containing the name of the type followed by all of the arguments to the type constructor (if any). For primitive types (e.g. ‘Int’), a runtime error is signaled, since no sensible generic behavior can be defined. Here's the definition for the generic version of ‘toJson’:

The ‘dataTypeOf’ function returns a DataType value representing a Haskell data type. Its first argument is a “proxy”, which fixes the superclass of the Data class that we're dealing with (in our case, the ‘Json’ class). Once the DataType of the value has been obtained, we check that it is an algebraic type, signaling an error if it is not. Now we call the ‘getFields’ function, which returns [] for ordinary algebraic data types and a list of field names for record types. In either case, we use the ‘gmapQ’ function (which also takes a proxy argument) to map ‘toJson’ over all the constructor arguments. Crucially, however, there is no direct recursive call to ‘toJson’. Instead, the following expression is used to obtain a suitable function:

(toJsonD dict)

This allows the most specialized instance of ‘toJson’ to be selected for each field of the record. If ‘toJson’ were called directly, the code would still type-check and compile, but the specialized instances of ‘toJson’ would not be called, leading to the wrong behavior at runtime. Again, the SYBWC paper gives the details of how this works.

At this point, we have a working generic serialization function for record types. All we need to do now is add instances of the ‘Json’ class for common Haskell types. For example:

instance Json Int where toJson i = show i

With this definition, we can now serialize nested records containing integers:

The ‘$(derive ...)’ lines invoke some Template Haskell code that automatically derives an instance of SYBWC's ‘Data’ class for the given datatype. That's basically it. We have a generic, extensible JSON serializer, and for the most part, we haven't sacrificed any type safety.1 The only problem that remains is that of treating strings differently from other lists. This is easily accomplished in the instance declaration for lists with the help of GHC's ‘cast’ function:

[It has been pointed out to me that the use of a cast here is not required. An alternative is to use the same trick that the Prelude uses to ensure that ‘show’ treats strings properly (namely, addition of a ‘showList’ method to the Show class).]

The finished library adds an ‘exclude’ member to the ‘Json’ class. This can be used to exclude fields of a record from serialization without having to write a full custom instances. For example:

I have not found a way of providing a default implementation for ‘toJson’ which is compatible with Haskell's type system, so if you want the default serialization behavior, you must manually set ‘toJson’ to ‘genericToJson’ (a function provided by the library).

Installing syb-with-class

In order to compile syb-with-class successfully for ghc-6.8, you need to make the following changes to the file ‘syb-with-class.cabal’. (Disclaimer: this is just a quick hack, not a proper fix.)

In the line beginning ‘Build-Depends:’, add ‘array’ to the dependencies.

In the line beginning ‘GHC-Options:’, add the option ‘-fglasgow-exts’.

You can now use the following commands to build and install the package:

1. The one exception is the possibility of a runtime error if we try to serialize a primitive type for which no instance has been defined. In practice, given instances for all the basic Haskell String/numeric types, this sort of error is very unlikely to occur. [^]