All the cool kids are using JSON instead of XML

My colleague Kurt Nordstrom mentioned a few days ago that there was this period of time when XML-based everything was the future. It was going to solve all our problems. Let’s use XML for everything!

Now, of course, we’ve all seen past the crazily naive idea that anything as mundane as the XML meta-format could make any real difference to anything, since it’s just a solution to the easy part of every task (syntax) and leaves the hard part to be done (semantics). No, we’re much more sophisticated than that now. Now we realise that JSON is the metaformat that will make everything suddenly work.

“At least JSON seems to know that it is only a data format”, says Kurt. “XML didn’t know that. Look at a schema like METS.”

But it looks that way because now we’re not just talking about XML, but about specific schemas: applications of XML. Which of course are what you need if XML is going to actually do anything. And in the same way, it’s only applications of JSON that are actually useful.

The big difference between them is this:

In the XML world, there is a fairly mature set of existing schemas, and an expectation that if you use XML you’re going to pick one of those and use it, or adapt it slightly. Whereas that ecosystem doesn’t (yet) exist for JSON. Which gives people the glorious feeling of freedom that they can use whatever the heck format they want, unconstrained by the tedious business of being compatible with anything else.

That’s where we are now. The next inevitable step is that people discover that JSON applications that should be compatible in theory, are actually not — that there’s a need for profiling, and for long, boring documents specifying the semantics of individual fields within a format.

Sure enough, in a few years, there will be JSON equivalents of METS, MODS, OWL and all the rest.

And everyone will be going “Oh, JSON is so heavyweight” …

And we’ll get away from all that, and start using a new, slimline metaformat that doesn’t have “all that baggage”.

Vamp till fade.

This of course is exactly how we got from SGML to XML. XML started out as a simplified, stripped down version of SGML that got rid of “all that complexity that no-one really needs”. Except it turns out that you do need it, so XML started accumulating all sorts of accessory standards like XML Namespaces, and XML Schema, and XPath, and XPattern, and XPointer. And before you know it, it was way more complex and heavyweight than SGML had ever been. Which is where we came in.

Henry Spencer said it best, back in the 1980s or maybe even the 70s: “Those who do not understand Unix are condemned to reinvent it, poorly.”

34 responses to “All the cool kids are using JSON instead of XML”

I love this, especially given the criticism that Unix was Multics reimplemented, poorly. :D A lot of this stuff is slightly over my head as I’m still really just a back-end Unix hacker who doesn’t really sully herself with anything that isn’t C (and had to be dragged kicking and screaming into using ANSI C; for that matter, it took over five years for me to finally accept that vi was a more productive editor than ed) but, yeah, I’m suddenly reminded of the 20-year-old me. Of course I knew it all back then, I could implement stuff that was much more efficient than the provided libraries and procedures and protocols. Now I just kind of grimace when I remember that point of my career. D:

I prefer to look at it from the perspective of feedback loops: things which are easily and regularly validated tend to interoperate, no matter the format.

From my perspective the major problems with XML usually came down to validation being optional and most of the tools providing an experience which was unpleasant enough that many programmers only worried about the specific other implementation(s) they needed to support. Working on tools unfortunately tends to be seen as less prestigious than writing standards but I can’t help think that the world would have been better off had more time been spent building high quality validators, working examples, and working implementations of new versions (e.g. XPath 2.0 effectively never happened for anyone using tools based on libxml2).

I’ve been working a bit recently with IIIF lately and was happy to see that they had validators before the implementations started to proliferate, which seems to me to be more significant than whether it uses JSON or XML.

I disagree, JSON is not overtaking XML because it lacks the baggage of XML, it is overtaking XML because it fits the problems more naturally. Fundamentally, JSON is a data language, while XML is a markup language. One can be used for the other, but doing so introduces an impedance mismatch; how does one represent an array in XML? How about a map? In XML there are multiple sensable approaches to these problems. In JSON there is usually only one. This means that you can’t just use XML, you have to use some further specification that answers these kinds of questions. What’s more, because XML doesn’t naturally support these data structures, such documents tend to be more verbose than the equivilant JSON. Because of this, JSON is usually preferible as a serialized data format, and overtaking XML for this kind of purpose.

That said, there are some tasks for which XML is well suited. Primarily, markup of long text documents. Writing a book in XML makes sense. JSON, EDN, or YAML would not function as effectively or naturally.

The key question is how do you represent an array or map in XML. The most basic XML specifications don’t provide much help here. In order to do this, you have to use something more specific than XML (XML-RPC, XML-DATA, something else entirely). There is a huge amount of flexibility in possible approaches using XML. In JSON (YAML, EDN) on the other hand, there is one clearly defined approach that works for most cases. This is a decision that doesn’t have to be made. As such, the resulting document is simpler, and more broadly compatible.

When you round-trip a json document to XML and back, you aren’t really round-tripping to XML, but instead some subset of XML which answers these questions.

Only if everyone thought that “id” was the correct attribute for map keys. The guy you just met from Australia who thought that “key” made more sense is what causes the problem. Much better to use a standard that doesn’t have the problem.

Of course it isn’t that simple for arrays either. This is demonstrated simply by looking at XML-RPC where it is

“`xml

ff
ff
00

“`
or some such.

Also, in your example, colour looks way more like a map than an array. The json for that would be something like:

The overall summary: this stuff is somewhat arbitrary, and when tasked with arbitrary decisions both choices will be used, and that will create incompatibilities. When such decisions can be removed, it is beneficial. The distance between a language, and the ideas being expressed matters. Of course this problem is what leads to techniques like XLST (or JSONTransform, which is hardly more than an idea in my head right now).

I think the biggest hassle with XML (and HTML for that matter) is that data has so many ways to be stored. An element can have attributes, children or a value. So to access and manipulate all of it you end up with very complicated APIs.

In contrast JSON maps very cleanly to programming constructs used by most languages: nested arrays and hashes. It’s just a lot easier to actually use.

I think this post would have made sense 5 years ago. At this point JSON is a pretty boring technology and hasn’t accumulated all the cruft that XML did. Sure there’s the http://json-schema.org people but they’re harmless.

My biggest complaint about XML was simply data bloat: sure, over-the-wire transfers can mitigate that reasonably with on-the-fly gzip compression and decompression; but the fact remains that the larger blob of text still has to be parsed at some point and there comes a point at which the data bloat isn’t worth it — whether that’s one 33mb message or 33000 1kb messages (as an arbitrary example).

And then you also have the aforementioned “confusion” between attributes and nodes — when passing arbitrary data around, you’d have to know whether to look at attributes or text nodes for data parts — JSON has no such confusion, though I’m sure someone out there laments that (:

JSON fits the web better imo because it’s faster to parse and generate in the browser for Javascript code — so it fits sending / getting little messages to keep your richer client pages working. Personally, I find JSON easier to read too, but I find YAML even easier to read, so there’s that. It’s far easier to get from a Javascript object to JSON (kind of by definition, as it were), so again, this fits the web well, most especially with the ubiquity of Javascript — it’s not just for web pages any more (:

I don’t think we’ll see the bloat that came to XML visiting JSON, partly because I (optimistically) think we may have learned our lesson from XML and partly because JSON isn’t trying to be anything else. XML was adopted as the basis for a bunch of higher-level markup languages and it seems like much of the bloat that came in was to get the XML engine to do the work of transforming to other formats — XSLT was neat, but boy, not fast, depending on implementation. I had a transform that would literally take most of a day on IE at some point — naively, I’d just output XML from SQL and hoped to get the browser to display it neatly. Which it didn’t. Of course, Chrome and FF did it much faster (only a minute or two of browser lockup), but still, it was an all-round fail.

As with all things, use what works, when it works. Move on when there’s a better option for your current requirements. JSON has superseded XML for all requirements I have when I’m generating it. I still have to deal with it for .NET configuration, so I’ll carry on using it there and anywhere else where it’s convenient to do so (:

> [Mike, sarcastically] Now, of course, we’ve all seen past the crazily
> naive idea that anything
> as mundane as the XML meta-format could make any real difference
> to anything, since it’s just a solution to the easy part of every task
> (syntax) and leaves the hard part to be done (semantics).”

At the low level, XML solves the wrong weird hard problem (markup) badly, where JSON solves the right mundane easy problem (syntax for nested data structures) well, and that *does* matter a lot.

Mike, you say that mature standard schemas are the big difference between JSON and XML. Are you saying that the syntax and API differences are small, and/or that syntax and ease of use really *don’t* matter?

“Schema” in XML doesn’t just mean the static typing of a file format, but a standard for expressing that in XML itself. XML schemas can be used for verifiers (and more). For widely-used standards, a standard schema language seems good to have.

But for JSON, 95% of semantics don’t need standard schemas. Programs that validate their inputs, and human-readable documentation, are things people do, and readable documentation must be superior to unreadable documentation. For XML, the statistics are probably different, partly because XML has had more time to grow standards, but also because XML discourages informal use.

In any case, static typing ought to be designed separately from the syntax for data representation, and JSON will probably continue that way.

Appropriateness to purpose matters. Most uses of both XML and JSON are simple. “Simple things should be simple and complex things possible.”

Steve, I am afraid that I have to defend XML for a second; XML doesn’t solve the wrong weird problem badly, but in fact solves the wrong wierd problem (markup) quite well. It is trying to solve the right problem with the solution to the wrong problem that leads to the bad results.

The real problem with having the static typing stuff completely seperate from the data storge is that tends to mean that it is in some programming language (Java, C++, etc). This doesn’t lead to a cross language solution. The nice thing about XMLs schema concept is that you can verify it in whatever language you happen to be using. The same is true for other schema languages such as RX or JsonSchema. It is always good to keep your schemas in a common serialized data format.

That is whjy XML is an interchange format.. not a _data format_ imho. It can be self documenting (for some definitions of documentation), and self verifying with the right tools. It has come to the point where that has to outweigh the natural lack of readability … a simple enough format doesn’t always need self-check, because you can just eyeball it. XML is generally very verbose and complicated, even split across multiple files (like XSD’s), so its essentially a machine-only format.

So, if you’re expecting the files to be shared among applications or organizations, it may be worth that cost.

But for config files, datafiles for a single purpose built app, etc, strikes me that XML is extremely unfit for purpose.

JSON is, at least, generally readable. It is assumed a parser can handle the free form nature better, but many JSON parsers are annoyingly strict. Still, fine. The size is still large (ie: a 32bit integer can be represented as a 4B binary value, or a say 30 character string with JSON wrappings, or a 200 character XML file with schema references… but going to JSON, makes it readable, cross application generally, and to the point.

As in all things, it depends on the purpose, but for me, I would ‘go to’ JSON by default. XML, only when cornered, after a knife fight. I dont’ want to have to dig out heavy tools just to inspect a file with any sensibility.

Marty: so sorry that the code-posting facilities in WordPress are so eccentric. I think this code was written before MarkDown became a thing; in 2016, that is pretty clearly the best language for writing comments in. I wish I had an option to switch this blog over to it.

Mike, you say that mature standard schemas are the big difference between JSON and XML. Are you saying that the syntax and API differences are small, and/or that syntax and ease of use really *don’t* matter?

Well, I am not really saying either of those things. The syntax differences between XML and JSON are big, obviously; but in practice they’re not particularly significant because we always use parsers (and indeed generators) for those formats, so we don’t need to deal with the by hand.

But that’s not really the point, either. The point is that you might say &ltauthor>Mike Taylor</author>, or you might say { “author” : “Mike Taylor” }. Either way, decoding that text stream to an in-memory object is the easy part. The hards parts are things like:

* What do we mean by author? Does it only mean sole author? If it pertains to multi-authored works, does it indicate only the first author? Only the corresponding author? Or any author? Does it include illustrators? Does it include editors? And so on

* Is “Mike Taylor” in first-name last-name format, or is “Mike” the surname? Is it the same name as “Taylor, Mike”? Is that the same as “Taylor, Michael” or “Taylor, Michael P.”?

* Do we have a way of knowing whether we mean Mike Taylor the dinosaur palaeontologist, or the other Mike Taylor who is a Mesozoic marine reptile palaeontologist?

In other words, the semantic questions. If 20 years working with information standards has taught me anything, it’s that these are always ten times harder than the mechanical problems.

Meanwhile …

“Schema” in XML doesn’t just mean the static typing of a file format, but a standard for expressing that in XML itself.

Yeah, that’s just ridiculous. It’s a symptom of a mental illness that seemed to grip the world around the turn of the millennium, where people thought expressing everything in XML was the way to do. The same delusion that gave us ant instead of make. If you want a schema language for XML, Relax-NG Compact is ten times more expressive.

Mike: No problem, I just don’t use WordPress enough to know how to feed it code.

Steve, as for json replacements for XSLT, there aren’t really any. XSLT exists to allow one to transform between XML formats using templates in XML and XPath. The tools you mentioned are pretty cool for ad hoc jobs, but not really good for in place use.

Mike I disagree with your assumption that “we don’t need to deal with these by hand.” I have frequently had to deal with serialized data by hand (for one off or investigative tasks), and have always been glad that we used JSON rather than XML for serialization. One of the really nice things about JSON rest services is that it is pretty easy to figure out how to interact with them by hand.

And Jeff, just to be clear — I would hate you, or anyone, to think I am advocating for the use of XML ahead of JSON. I just think neither of them gets you anywhere near as much as their advocates imply.

When compared to the serialization formats of ole, they really do. You haven’t lived until you have had to parse a poorly documented proprietary binary or field length text format. They are *really* fun.

“And Jeff, just to be clear — I would hate you, or anyone, to think I am advocating for the use of XML ahead of JSON. I just think neither of them gets you anywhere near as much as their advocates imply” .. this is true in so many things, and I think the basis of all forms of negotiation; empathy for other perspectives :) And yes, there are so many die-hard adherents to XML (and others), preaching a gospel and agenda that really does not necessarily suit.

But I think we some time ago turned the tide where everyone blindly jumped to XML and java; the processing/parsing overheads and the relative annoyance of needing a tool to inspect a file (I’m thinking of things like XML description of MVC in java apps here), have just turned off an entire generation of developers.

Mike, I think you’re focusing on the hardest use-case– the combination problem of tough semantics plus interoperability– and thinking that a data format has to be armored for that case.

“Simple things should be simple and complex things possible,” I think we’ll continue to use JSON simply for simple things and keep the complexity out of the data format itself for tough-semantics interop. XML never allowed the simple use, the 95% use, to be simple. The creation of tools and standards around JSON needn’t make JSON itself become complex-seeming.