Native JSON Support is Required

There’s a JavaScript feature that I feel needs much more attention: Native JSON support in browsers.

It’s something that should be a JavaScript language feature and yet no browser, or standards body, has defined, specified, or implemented what exactly it is, or should be. On top of this, it is one of the most (implicitly) requested language additions.

A recent ticket was opened for the Gecko engine, requesting a form of native JavaScript (de-)serialization whose API is based upon an implementation by Douglas Crockford. In effect, you’d be getting two additions to the JavaScript language: A .toJSONString() method on all objects (which will serialize that object to a string representation of itself) and a .parseJSON() method on all strings (which is responsible for deserializing a string representation of a JSON object).

I’d like to try and make the case for why I feel that native JSON support should standardized and made part of the JavaScript language.

1) The recommended implementation is considered harmful.

There are, currently, two popular means of transferring and deserializing JSON data. Each has their own advantages, and disadvantages. The two methods are:

And JSONP, which adds JavaScript markup around simple JSON data and is transferred (and deserialized) by injecting a <script> element into a document.

Currently, Crockford’s json.js is the recommend means of deserializing JSON data. This has been discussed extensively elsewhere. Currently, json.js is better at keeping malicious code from being executed on the client (JSONP has no such protection), and thus, is recommend for most use cases.

However, json.js has taken two serious blows lately, which has moved it from being “recommend” to “harmful”:

Joe Walker recently exposed a vulnerability which allows malicious users to covertly extract information from JSON strings that are deserialized using JavaScript’s eval() statement. json.js currently makes use of eval(), making it vulnerable to this particular attack. This vulnerability has been discussed elsewhere too.

In order to fix this, json.js would need to use an alternative means of parsing and serializing the JSON-formatted string – a means that would considerably slower than the extremely-fast eval() statement.

It breaks the browser’s native for..in method of iterating over object properties.

At this point, its pretty safe to say that that extending Object.prototype is considered harmful – and many users agree. Extending an Object’s prototype is generally considered reasonable for personal situations, but for a publicly available, and highly recommended, script like json.js, it demands that it behave in a user-conscious manner.

Some attempts were recently made at cleaning up how json.js behaved, but thus far, no considerable effort has been made to provide an alternative means of deserializing JSON strings, that doesn’t break a JavaScript engine’s native behavior.

Summary: By adding support for JSON within a browser both of these issues will be completely circumvented (malicious users won’t be able to extract data from a JSON structure, nor will the parseJSON and toJSONString methods be able to break for..in loops).

2) The recommend method of deserialization doesn’t scale.

Let’s start by looking at the two most popular methods of transferring JSON data (JSONP and an XMLHttpRequest transferring plain JSON), along with using an XMLHttpRequest to transfer some XML data (just for fun).

I’ve set up a series of tests that we can use to analyze the speed and efficiency of traditional JSON (using json.js), JSONP, and XML. I made 3 sets of files each containing a set of data records. As a base, I used some XML data from W3Schools. In the end, I came up with a total of 24 test files, each with a different number of records, in each specified format.

All of these get referenced from my mini test suites (one for each data type): JSON, JSONP, XML

These suites default to requesting the 50-record file 100 times, and taking an average reading. To get a reading on a different recordset, visit the file with the number of records in the URL, e.g.: json.html?400. Please don’t run this on my server, it’ll make it cry.

Looking at the numbers for the recommended means of transferring JSON data, we don’t get a full picture. Where are the scalability issues coming from? Maybe XMLHttpRequests don’t scale so well? (Which would also help to explain the numbers for the XML transfers.)

To resolve this, let’s break down the numbers for JSON into time spent processing and time spent transferring the data.

Data Records by Time (in seconds)

Here is where we see the major numbers come out. We can see that the processing time increases at a pace slightly less than O(n) time. This might be fine for most cases, however we can clearly see (from the previous chart) that JSONP is fully capable of faster parsing times.

Additionally, the processing time that json.js takes is completely blocking – no other operation is able to take place when the deserialization is taking place. When running the test suites you’ll find that when the high (200-1600) record sets are processed your browser will stall (and if you’re on a mac, you’ll get the spinner of death). By passing this complete operation off to the browser you’ll avoid all of these complications.

Summary By adding native JSON support within a browser you would have the blazing speed and scalability of JSONP, without adding any significant overhead.

Side Discussion:

In addition to studying the scalability of transferring JSON data, I’ve also looked at the overhead costs of pushing JSON data into an HTML structure (especially when compared to XML-formatted data).

Currently, some browsers have a native means of processing and converting XML data structures on the fly, using XSLT. XSLT is an incredibly powerful templating language and is more than capable of transforming any XML structure into an appropriate XHTML data structure.

However, for JSON data, no killer-templating system exists. I’ve used JSONT extensively but, in reality, it doesn’t hold a candle to XSLT. (Both in terms of features and speed.)

You can see that, even with the extra speed advantages of JSONP, all of that lead is blown away by the incredibly slow nature of JSONT. Unfortunately, the situation isn’t as clear-cut here as it was comparing JSONP and json.js, considering that JSONT is hardly a worthy replacement to XSLT.

Summary: In order for JSON to be, at all, competitive with XML/XSLT, there has to be a native means of transforming JSON data structures into strings (or DOM fragments), that can then be injected into an HTML document.

3) Upcoming standards imply its existance.

There are two upcoming standards that require some form of JavaScript object serialization (either explicitly, or implicitly).

The first, JSONRequest, (unsurprisingly, also from Douglas Crockford) is a means of securely transferring JSON data cross-domain. However, in order to implement this feature, some form of native JSON deserialization will be need to be implemented within the browser.

I know that Mozilla is considering implementing this feature, as is Microsoft, in Internet Explorer. Hopefully, this will mean that the two biggest browsers will have an implementation out quite soon.

The second is the new DOM Storage standard introduced by the WHATWG. While this feature does not require a form of JavaScript object serialization, it does only allow data to be stored as strings (and in order to make the most efficient use of that storage area, a form of object serialization will be needed).

However, the fact that two upcoming pseudo-standards need some form of object serialization requires us to re-examine the proposed API. It is imperative that it be rock-solid before being implemented in multiple platforms.

However, there’s still a lot of ambiguity in this particular API – especially when it comes to aspects of the language that aren’t perfectly serializable.

Function and RegExp

Note that the parseJSON method (rightfully) balks at extracting a Function or a RegExp:

"function(){}".parseJSON();
ERROR: parseJSON: undefined

"/foo/".parseJSON();
ERROR: parseJSON: undefined

While it happily serializes either using toJSONString() (with varying results):

function test(){}
test.toJSONString()
>> "{"prototype":{}}"

/foo/.toJSONString()
>> "{}"

null

Also, note that while parsing a serialized ‘null’ gives you the correct value back:

"null".parseJSON()
>> null

it is unable to convert a null into its serialized form:

var foo = null;
foo.toJSONString()
ERROR: foo has no properties

A very important point needs to be made here: Some form of definition and specification should be made regarding this language addition. And soon. Even if it’s nothing other than defining that the above behavior is correct – that should be specified somewhere for all browser developers to follow.

Summary: The current, recommended, implementation of JSON parsing and serialization is harmful and slow. Additionally, upcoming standards imply that a native JSON (de-)serializer already exists. Therefore, browsers should be seriously looking at defining a standard for native JSON support, and upon completion implement it quickly and broadly.

To get the ball rolling, I recommend that you vote up the Mozilla ticket on the implementation, to try and get some critical eyes looking at this feature, making sure that it’s completely examined and thought through; and included in a browser as soon as possible.

Update: I located the specification that defines support for toJSONString/parseJSON in ECMAScript 4. This is great news. Now that the definition for this feature is happily moving along, it’s just time to get some implementations going!

First of all, I agree that jsON serialization should be put into the browsers, but then we will have what M$ thinks how it should be done vs. what everyone else does.
Keeping jsON â€œsafeâ€ seems to me to be of a security issue with Javascript. It involves overloading core script like the Array or Object constructor. Correct me if I am wrong, but doesnâ€™t this involve a Cross-Site (XSS) or Man-In-The_Middle attack? This has nothing to do with jsON, except when using jsON Request is to grab data from multiple domains. (which Ajax with XmlHttp can not do) It simply is untrusted code. To deal with this untrusted code – why not follow the prototype pattern of loading js files using the library itself? That is – the first JS in a page is a loader object that can grab new javascript files, but then verify the new files do not muck around like changing core functions.
Using Douglas Crawfordâ€™s method for extracting the data is considered bad because he changes the behavior of Object.prototype. I agree this is usually bad, but I am sure we can tweak the code to make it behave.
John then describes how jsON is speedier than XML. Sort of. The big reason why people use json is because it IS JavaScript and is several orders of magnitude faster than raw XML. But if we want to _transform_ the json to do data tricks on it, then it is much slower than XSLT. This is because XSL is a standard that has been around for 10 years, while transforming JSON has not even been around 10 months. But JSONT will mature, and there are other, faster ways to transform jsON. Also there are trick you can do with json you simply can not do with XSLT as it currently is supported in all browsers

I don’t see you the behavior with regards to functions and regexps is a problem. The to data types aren’t supported by JSON (see json.org) so it is no wonder. One feature of JSON is that it is language independent, so serialized functions wouldn’t make much sense.

That you can’t call methods on a null object seems to be just a tiny implementation detail. Use a wrapper like function t(v){var s=[v].toJSONString(); return s.substring(1,s.length-1);} if you really have the need for that (or use a functional interface instead of a oo-interface).

It seems that you would rather like access to spidermonkeys “uneval” function.

I think you are confusing things when you talk about security issues with JSON (and so does Kid666 in the blog post you are linking to – this is really about information disclosure, not XSS). The problem is that you can include a script tag on your web page that will load JSON from a different web page (e.g. GMail). If you redefine object/array prototypes before doing that you will be able to actually read the data. Of course if the script already contains an eval() call it makes your job even easier. So JSON encoded data generally has to be considered unsafe – regardless of how it is parsed. Either the parser is executed in its own domain and then you need an XSS vulnerability to infiltrate it – but if you found one you can read the data directly. Or the parser is executed in your domain and then you have to get the data first that you will feed it.

I heard of two solution to this problem. Either you don’t serve JSON with sensitive data under a fixed URL (e.g. require the session ID to be present in the URL) or you insert an endless loop before the JSON data. You can remove the loop before passing data to the parser, a web page loading the data with a script tag won’t be able to. A different parser on the other hand doesn’t solve anything.

crud is correct. Sorry I was in a hurry earlier and couldn’t fully explain this.

Rather than reusing an existing OSI-approved license, Douglas Crockford has decided to create his own. His “JSON license”, which explicitly covers json.js, comes encumbered with a rather alarming exception: “The Software shall be used for Good, not Evil.”

Funny, right? Ha ha. I love jokes in licenses. They ring about as hollow as bomb jokes in airports. We can all have a good laugh and get on with the business of writing code.

Except for one thing: he’s not kidding. He has publicly confirmed that it’s not a joke, saying “If someone is unable to assert that they are not using it for Evil, then I don’t want them using my software.” ( source: http://simonwillison.net/2005/Dec/16/json/#c32497 )

That Crockford, he’s a funny guy. Invite him to parties. Don’t use his code.

In point 1, you say that adding methods to Object.prototype is considered harmful. But in point 3 you seem to be happy with the new Object.prototype.toJSONString method. Does making that method native circumvent the problems considered in point 1 ?

@zimbatm: That’s correct (I should’ve specified that in the post). For example, all objects have a .toString() method. You can’t iterate over it, even though it exists (which is exactly how a .toJSONString() method should work, too).

@Anders: “I donâ€™t see you the behavior with regards to functions and regexps is a problem.” It’s not, necessarily, a problem – but it’s just something to consider. Obviously, people may treat Functions or RegExps like objects (extending them with their own properties), so it’s probably a good idea to make an honest attempt at serializing them. However, why is the extra prototype property provided for the Function serialization, and not the RegExp one? Although, it seems as if some work is already being done (in the ES4 spec) to solve this.

@Wladimir: “A different parser on the other hand doesnâ€™t solve anything.” A different JavaScript-based parser, no. But a parser that’s written in C and embedded in the browser is another issue entirely. The browser will be able to circumvent most of the trickery that a malicious user would be able to implement. However, JavaScript Arrays, in general, may still be vulnerable to this exploit – but that’s another issue at play here.

@Robert: I didn’t intend for it to sound like that – only that “The recommend method of deserialization doesn’t scale [as well as JSONP].” Presumably since JSONP’s JavaScript parser/serializer is located within the browser core itself, rather than implemented purely in JavaScript. Simply: By removing the need to have a pure-JavaScript JSON parser, you lose a lot of overhead and computation time.

Mark, take a look at json.js. It’s public domain. At one point it had a restrictive license, but the latest public domain version is timestamped the day before your first comment. Also, a modified version of Crockford’s code is used in Firefox under the GPL to parse search suggestions.

John,
Date literals would be an incredible addition for the sake of JSON!!! This is something that should have been their from the beginning. Your suggested implementation is not bad! have you suggested it to standards bodies?

All,
I disagree with further prototyping which can be problematic on multiple platforms. I think a very simple JSONencode() and JSONdecode() would better serve the purposes like what is found in PHP’s PECL implementation for JSON. (which still needs work BTW doesn’t read basic JSON from del.icio.us!!!) While JavaScript’s great strength is it’s power to prototype it’s also a great weakness. By keeping all the JSON code in fewer functions we are able to keep the code base small and eschew obfuscating the implementation in multiple prototypes.