Links

Tuesday, May 15, 2007

modelled serialization

Too many times I've seen programmers writing their web services, where they are
generating the web service output 'by hand'. Worse, incoming structured input to
the services (XML or JavaScript), parsed by hand into objects. Maybe not parsed, but
DOMs and JSON structures walked. Manually. Egads! Folks, we're
using computers! Let the computer do some work fer ya!

In my previous project, we used
Eclipse's EMF
to model the data we were sending and receiving via RESTy, POXy web services. For an example
of what I'm referring to as 'modelling', see this
EMF overview
and scroll down to "Annotated Java". For us, to model our data, meant adding
EMF comments to our code. And then running some code to generate the EMF goop.
What they goop ended up giving you was a runtime version of this model you could
introspect on. Basically just like Java introspection and reflection calls,
to examine the shape of classes, and the state of objects, dynamically.
Only with richer semantics. And frankly, just easier, if I remember correctly.

Anyhoo, for the web services were we writing, we constrained the data being
passed over the wire to being modelled classes. Want to send some data from the
server to the client? Describe it with a modelled class. Because the structure
of these modelled classes was available at runtime, it was (relatively) easy to
write code that could walk over the classes and generate a serialized version
of the object (XML or JSON). Likewise, we could take an XML or JSON stream
and turn it into an instance of a modelled class fairly easily. Generically.
For all our modelled classes. With one piece of code.

Automagic serialization.

One simplification that helped was that we greatly constrained the types of 'features'
(what EMF non-users would call attributes or properties) of a class; it turned
out to be basically what's allowed in JSON objects: strings, numbers, booleans,
arrays, and (recursively) objects. We had a few other primitive types, like
dates and uuid, but amazingly, were we able to build a large complex system
from a pretty barebones set of primitives and composites. Less is more.

For folks familiar with WS-*, none of this should come as a huge suprise. There
are basically two approaches to defining your web service data: define it in XML
schema, and have tooling generate code for you. Or define it in code,
and have tooling generate schema for you. In both cases, serialization
code will be generated for you.
Neither of these resulted in a pleasing story to me. Defining documents in schema is not simple,
certainly harder than defining Java classes. And the code generated from tooling
to handle schema tends to be ... not pretty. On the other hand, when starting
with code, your documents will be ugly - some folks don't care about that, but
I do. The document is your contract. Why do you want your contract to be ugly?

Model driven serialization can be a nice alternative to these two approaches,
assuming you're talking about building RESTy or POXy web services. Because it's relatively
simple to create a serializer that feels right for you. And you know your data
better than anyone; make your data models as simple or complex as you actually need.
If you're using Java, and have complex needs, consider EMF, because it can probably
do what you need, or at least provide a solid base for what you need.

Besides serialization, data modelling has other uses:

Generating human-readable documentation of your web service data. You were
planning on documenting it, right? And what, you were going to do it by hand?

Generating machine-readable documentation of your web service data; ie, XML schema.
I know you weren't going to write that by hand.
Tell me you weren't going to write that by hand. Actually, admit it, you probably weren't
going to generate XML schema at all.

Generating editors, like EMF does. Only these editors would be generated in that 'junky' HTML/CSS/JavaScript
trinity, for your web browser client needs. Who wants to write that goop?

Writing wrappers for your web services for other languages. At least this helps with
the data marshalling. Again, JavaScript is an obvious target language here.

Generating database schema and access code, if you want to go hog wild.

If it's not obvious, I'm sold on this modelling stuff. At least lightweight versions thereof.

So I happened to be having a discussion with a colleague the other day about using software modelling to
make life easier for people who need to serialize objects over the web. I don't think
I was able to get my message across as well as I wanted, and we didn't have much time
to chat anyway, so I thought I'd whip up a little sample. Code talks.

This sample
is called ToyModSer - Toy
Modelled Serializer. It's a toy because it only does a small amount of what you'd
really want it to be able to do; but there's enough there to be able to see the
value in the concept, and how light-weight you can go. I hope.