Wednesday, June 06, 2007

Why use XML for serialization?

In my recent post about WADL, I mention some of the problems with using XML as a data serialization format. In the comments, Eric asks why use XML for serialization at all? Browser-based apps are rapidly moving to JSON. Ruby developers prefer YAML. If XML really is such a mismatch for modern programming languages, why bother to use it?

I could argue about the availability of tools (XSLT/etc) to process XML. At AgileDelta, we often use E4X to help customers filter or reformulate their XML data. At Microsoft I often used XSLT, or MSXML + JavaScript to do quick-n-dirty processing of XML data. Really, JSON and YAML have a bit of an advantage here, because you can use the full power of your scripting language.

JSON is actually a hugely better than XML fit for many data serialization scenarios. It is more compact, faster to parse, and less ambiguous about how to map to your data-structures.

Where XML really has an advantage, is for formats which mix markup and data. Atom/RSS is a perfect example, as is XSLT. You could do ATOM is JSON, but then half your data is in one encoding (JSON) and the other half is in another (embeded HTML). With XML you get both, which is a huge boon when trying to process the data.

The other place that XML starts to have an advantage, is when you start having to support cross version compatibility. Eventually, you are going to have to write a loader that translates v1 data to your v2 data-structures. Browser-based apps can mostly avoid that. Enterprise apps can not. The closer you get to the database in a multi-tier app, the less that strategy works. The real benefit is that XML already gave you a serialization abstraction, so implementing data transformation becomes trivial. I'm not saying that this data transformation is not possible with JSON/etc, just more difficult.

Playing devil's advocate for a second here... given how few apps live long enough to actually need real back-compat layers, maybe the benefits of using JSON for V1 are worth it. Who cares if V2 is a days more work, if JSON shaved a week off your V1 schedule.

Back on point... XML helps by forcing a slightly higher level of abstraction. That also is realized by providing better facilities for validation of message contracts. As much as I dislike WADLs focus on XSD to the benefit of building RPC-like tools, XSD for message validation can be extremely useful. (Or use Relax-NG, or whatever other XML Schema language you happen to like) It provides a clear specification for what a message must look like. One of the down-sides of JSON/etc is that they tend to go unspecified. It takes a lot more work to ensure that your are not writing out properties that were not intended, for example. XSDs are not particularly useful in production, but they can be invaluable in development and testing.

So why use XML for serialization?1) your data is a mix of marked-up text and structure2) abstraction to simplify contract validation and cross-version compatibility

You could theoretically get most of (2) with JSON/YAML.. someday. XML provides that out-of-the-box today.