This is a good point, but I feel that discouraging this type of approach is not the way to go.

I apologise in advance for ranting... I hope this is not too off-topic, but instead a "zoom out" on the issue.

This touches on something deep and wrong about how we use computers these days. Computers are really good at being computers, and the amplification of intellectual capabilities they afford is tremendous, but this is reserved for a limited few that were persistent enough and learned enough to rediscover the raw computer buried underneath, and what it can do.

For example, I dream of a world where everything communicates through s-expressions, all code is data and all data is code. Everything understandable all the way down. Imagine what people from all fields could create with this level of plug-ability and inter-operability. We had a whiff of that with the web so far, but it could be so much more powerful, so much simpler, so much more elegant. All the computer science is there, it's just a social problem.

I understand the security issues, but surely limiting the potential of computers is not the solution. There has to be a better way.

Lack of Turing-completeness can be a feature. Take PDF vs PostScript. The latter is Turing-complete and therefore you cannot jump to an arbitrary page or even know how many pages the document has without running the entire thing first.

By limiting expressiveness you also gain static analysis and predictability. It's not about limiting the potential of computers, it's about designing systems that strike the right balance between the power given to the payload and the guarantees offered to the container/receiver.

For example, it is only because JSON is flat data and not executable that web pages can reasonably call JSON APIs from third parties. There really is no "better way" -- if JSON was executable then calling such an API would literally be giving it full control of your app and of the user's computer.

If you have a nice data format like s-exprs, it's a fairly simple matter to just aggressively reject any code/data that can't be proven harmless. For example, if you're loading saved game data, just verify that the table contains only tables with primitive data; if there's anything else, throw an error. Then you can safely execute it in a turing-complete environment and be sure it won't cause problems.

Speaking for myself, in my ideal world this sort of schema-checking and executing is ubiquitous and easy. Obviously that's not the world today. While there are tools for checking JSON schemata there doesn't seem to be a standard format. I wonder how hard it would be to implement a Lua schema-checker.

While you can register custom handlers for specific tags, properly implemented readers can read unknown types without requiring custom extensions.

The motivating use case behind EDN was enabling the exchange of native data structures between Clojure and ClojureScript, but it's not Clojure specific -- implementations are starting to pop up in a growing number of languages (https://github.com/edn-format/edn/wiki/Implementations).

I've looked at EDN a bit, even started a sad little C# parser. I don't see what it has to do with my previous comment, which is all about how schemas are potentially useful. I'm trying to say that after you check the schema, you don't just read the data, you execute it, and that has the effect of applying the configuration or just constructing the object.

>There really is no "better way" -- if JSON was executable then calling such an API would literally be giving it full control of your app and of the user's computer.

Of course there's a "better way": running the code in a sandbox. You could do so using js.js[1], for example. (Of course, replacing a JSON API with sandboxed JS code is likely to be a bad idea. But it is possible.)

You're right inasmuch as I shouldn't have implied that unsandboxed interpretation is the only option.

But my larger point still stands; the fundamental tradeoff is still "power of the payload" vs "guarantees to the container." Even in the case of sandboxed execution, the container loses two important guarantees compared with non-executable data formats like JSON:

1. I can know a priori roughly how much CPU I will spend evaluating this payload.

2. I can know that the payload halts.

This is why, for example, the D language in DTrace is intentionally not Turing-complete.

I agree 100% with you, but #1 isn't completely true. The counterexample is the ZIP bomb (http://en.wikipedia.org/wiki/Zip_bomb) Whenever you unzip anything you got from outside, you should limit the time spent and the amount of memory written.

2. if those limits are hit, you can't tell whether the code just ran too long or whether it was in an infinite loop.

So now if we fully evaluate the options, the choice is between:

1. A purely data language like JSON: simple to implement, fast to parse, decoder can skip over parts it doesn't want, etc.

2. A Turing-complete data format: have to implement sandboxing and CPU limits (both far trickier security attack surfaces), have configure CPU limits, when CPU limits are exceeded the user doesn't know whether the code was in an infinite loop or not, maybe have to re-configure CPU limits.

Sure, sometimes all the work involved in (2) is worth it, that's why we have JavaScript in web browsers after all. But a Turing-complete version of JSON would never have taken off like JSON did for APIs, because it would be far more difficult and perilous to implement.

I have to agree here. General Turing-completeness was known from the beginning to imply undecidable questions -- about it's structure, running time, memory and so on. I don't think this has a place as the 'data'.

Abstractions exist for a reason -- this is analogous to source/channel coding separation or internet layers. They don't have to be that way, but are there for a reason.

Someone could change my opinion, though. Provide me a data format which proves certain things about it's behavior and that would be a nice counterexample.

Pronouns are fine. Substituting 'the first' and 'the second' would be an improvement. It's specifically 'former' and 'latter' that should be deprecated. I'd be interested in seeing a study comparing readers' comprehensions of the various phrasings. What cost in clarity would you be willing to pay?

Back on topic: The reason for PDF's existence is to be a non-turing complete subset of postscript. Features like direct indexing to a page are why Linux has switched to PDF as the primary interchange format.

When it comes right down to it, you can't fully protect people from themselves. Even in 'meat space', which the general population is presumably experienced with, people talk others into doing things that they should not all the time. Anything from social engineering to bog-standard scam artists masquerading as door-to-door salesmen.

But in 'meat space', it is way harder and more expensive to do evil against large numbers of people. For example, phishing as done electronically (throw a very, very wide net, and hope) doesn't make economical sense if one had to do it manually.

Also, if, for example, cars and airplanes and banks and nuclear submarines would accept executable code as input, some people would do damage on a gargantuan scale.

Clearly, being liberal in what you accept must end somewhere. I argue that it should end very, very soon. Even innocuous things such as "let's allow everybody to read the subject of everyone's mail messages", if available at scale and cheaply, would entice criminal behavior, for example by those mining them for information that you are away from home.

I'm not saying that scams in "cyberspace" don't present a greater threat than scams in "meatspace". I'm just pointing out that if you cannot protect people from themselves in "meatspace", then doing it in "cyberspace" is futile. You can fight it and cut back on it, but you will never actually win that fight. Technological problems to what is ultimately a sociological problem only go so far, and we should be careful to not obsess over them to a fault.

And yet, in "meatspace", chainsaws come with more safety mechanisms than butter knives because the damage they can do is so much larger. Yes, we can't win that fight; people will die from chainsaw accidents, but I disagree that we shouldn't be more vigilant about chainsaws than about butter knives.

This seems like exactly the problem the parent post is complaining about, though: the people in the limited group the parent talks about aren't the people being tricked by selfxss, it's the people who don't have the technical knowledge to understand what the developer console does and why pasting in random JS might be a bad idea. So the phenomenon of selfxss reinforces the point.

Perhaps modern operating systems (or hardware?) need two modes - "Safe mode", where everything is sanitised, checked, limited and Secure Boot-style verified, and "Open mode" where it's not; where experts and enthusiasts can work without limit and without DRM.

On the other hand, taking the easy way out when this kind of security problem comes up leads to having a machine that's just an appliance and not a computer. If you've closed every local code execution vulnerability, you've probably rendered your system completely non-programmable and erected a monumental barrier to learning how to hack.