There is an interesting discussion occurring regarding data transfer in web applications. The discussion has centered on the differences between JSON and XML in the JavaScript heavy sites. It started with Norm Walsh commenting on Twitter and Foursquare removing support for XML in their APIs. The basic idea of his post was that if you are using JavaScript, and you are only passing around atomic values or lists and hashes of atomic values, then JSON makes complete sense. He then talks about the difficulty of JSON when you need more context or you have mixed content. Overall, it was a very sensible post. The discussion gained steam because of Norm’s “Meh” reaction, and because talking about “which is the better technology” tends to get people all riled up.

A few days after Norm’s post, two other posts appeared refuting his stance even though in many comments it is considered a non-debate. First, Manu Sporny talked about the move to JSON being more of a paradigm shift to simpler markup. The complaints about SOAP and XML Schemas are obvious, but he complicates the argument by introducing JSON-LD into the conversation. JSON-LD introduces syntax for JSON to denote LinkedData, and there is notation very similar to XML Namespaces, to which Norm replies “Wow. By the time you start doing that, you’re sure you wouldn’t be better with a richer markup vocabulary?” Lastly, James Clark throws his opinion into the mix. His commentary is more about the fact that XML is losing web developers which could be a bad thing.

Now that you have the background story, I wanted to state that people are missing Norm Walsh’s original point. This problem is about context and it is not really being treated that way. People are using JSON for web development because there is almost zero learning curve. It is used because of the increasing trend of JavaScript heavy sites to drive interactivity and some of the mashup creativity. Because Twitter’s is readily available, people have created widgets to display their tweets on a web page. For many web developers that means grabbing a JSON representation of some tweets and converting it to HTML. This process barely takes longer than trying to find the correct API documentation.

In this same context, if the APIs are XML based you then need something to parse the XML into an appropriate JavaScript object. You can already tell that this is getting more complicated than simple JSON. To make matters worse, past browsers handled XML differently and sometimes very poorly. Because web developers were depending on the XML support in the browser, the problems of cross-browser support arose again. Obviously, developers do not want to go down that path and JSON is easier anyway.

However, what if you are using PHP or Java on the server? PHP has plenty of XML handling libraries, with SimplePie being a hugely popular RSS feed processing library. If you can make the Twitter API call from your Java server code, there are plenty of libraries for handling XML there as well. So, in that context XML may be a better option.

If you look at the type of problem, this could also change the data format. In the Twitter API example, if you have a simple widget that just displays tweets, then a solution based on JavaScript and JSON makes a lot of sense. What if that widget needed to be more dynamic? For example, let’s say that someone registered on your site will see those tweets displayed differently and there are additional links in the tweet for replying or retweeting. You could write some JavaScript code to just check for registered users and generate different HTML, but this gets unwieldy if the number of different displays grows beyond 2 or 3. If the data is in XML, then you can write different XSLT scripts for each display which remains separate from the main widget code. You just need to select the appropriate XSLT based on the user interacting with the site. At this point people are likely going to complain about the use of SOAP for web services and its complexity. Let’s ignore that option as REST has won, and SOAP is overly complex, can we agree on that and move on?

As with any programming problem, different requirements and different contexts may call for different technologies. If you get stuck on saying that JSON is better than XML (or the other way around), you lose another tool in your toolbox.

The other context that people are missing is why Twitter and Foursquare chose to support JSON only. This is likely a question of application complexity and analytics. Like any good API provider, Twitter is probably tracking all calls to the API and this includes the data format requested. It is very possible that the demand for XML was fairly low and it did not warrant separate support. In addition to this, there are plenty of JSON processing libraries available for mainstream languages like Java, so there was little risk in dropping support for XML. If there is no support for XML, then their API becomes simpler to support. That means less code to maintain, simpler maintenance of code because there are not multiple representations of one set of data, and fewer questions about the different formats.

So, quit whining about whose data format is better. Each one is better in a different context, otherwise it is highly unlikely that they would have become so popular. The important thing is to learn both formats, and other popular ones that appear, that way you can make an educated decision on which format to use in your situation.

4 responses to "The Problem Is Not JSON Or XML, It Is About Data Context"

[…] This post was mentioned on Twitter by Rob Diana and RegularGeek, twitips MV. twitips MV said: The Problem Is Not JSON Or XML, It Is About Data Context: There is an interesting discussion occurring regarding… http://bit.ly/gBHpj6 […]

The clear and stated reason why Twitter switched to JSON only is that it is lighter weight than XML. This has been explained many times on the Twitter Developer Google Group. The Twitter API is experiencing massive demand growth, and there are many apps that insist on collecting all data on everyone using the service. Let’s take a simple example. When Twitter started, their docs said that 2,000 followers was more than enough. They even put in a road block that still exists making it harder to get over 2,000 by using programmed following techniques. The API was set up to give you full account info on all the followers for any users. Fast forward to the time of Kutcher, Kardashian and Oprah, and now accounts have millions of followers. The API still has to keep delivering full data on every one of those accounts. Would you want to be the sysadmin asked to keep up with that bandwidth demand? What would you suggest? It’s obvious, cut out the XML delivery, because it sends more data per entity than JSON.

I understand Twitter’s reasoning given the scale of their API traffic. I was just trying to put some balance into the argument. There are some cases where using XML is appropriate, in other cases JSON is more appropriate. Just because Twitter chose JSON does not mean that it is the one true solution for all people. It just means that it was the right decision for their situation.