Starving the Turing Beast…

The components that make up distributed systems fundamentally need to talk to each other order to achieve coordinated behaviour. This introduces the need for components to have a common set of expectation of behaviour, including recognising the difference between valid and invalid messages. And fairly obviously this has safety and security implications. Enter the study of linguistic security to address the vulnerabilities introduced by the to date unrecognised expressive power of the languages we communicate with.

The problem is that up to now security has been framed in the context of code, but this approach fails to understand that recognition and context are essentially language problems. If we look at the problem from a language perspective it brings us firstly to the work of Chomsky on languages and then to Turing on computation. As it turns out above a certain level of expressive power of a language in the Chomsky hierarchy figuring out whether an input is valid runs into the halting problem of Turing. For such expressively powerful languages the question, ‘is it valid?’ is simply undecidable, no matter how hard you try. This is an important point, it’s not just hard or even really really hard to do but actually undecidable so…just don’t go there.

Unfortunately, and I seem to say that a lot on this blog, language security has only recently been recognised as the first essential step in securing heterogenous distributed systems (like the internet). Instead in the evolution of the internet, as a large scale example, we’ve subscribed with a greater or lesser degrees of naivety to Postel’s law, where overly liberal acceptance introduces vulnerabilities.

As the law of unintended consequences dictates we’ve thus ended up with a running sore of hard-to-parse protocols requiring complex and bug prone parsers, that in turn can become weird machines for carefully crafted input exploits. What parsing that is done is mostly ad-hoc and distributed throughout the code making the problem of formally verifying it’s security almost intractable. The .net result of this is that we’ve ended up in the second great crisis of computing, that of security. Yep, Collingridge’s dilemma, revisited again.

Your data format is a language, treat it as such. … Design has been about code patterns; it should be also about data patterns or rather languages

S. Bratus, M. Patterson, Brucon 2012

Then there’s the equivalence problem, for communication to work securely communicating nodes must share a common context, otherwise an exploitable ambiguity is introduced, and therefore parsing must therefore be computationally equivalent at each end. If you don’t receive the protocol with an equivalently powerful recogniser then some inputs will result in unexpected state and state transitions in and most likely a resultant weird machine. But, for non-deterministic protocols, the task of validating parser equivalence between nodes is again mathematically undecidable.

To fix the current predicament we’re in we clearly need to amend Postel’s principle, and in that spirit Sassaman et al. (2011) have proposed the following set of countermeasures as such:

apply full recognition to inputs before processing,

treat inputs as language accept it with a matching computational power and generate its recogniser from its grammar,

treat input-handling computational power as privilege, and reduce it whenever possible,

make your protocol context-free or regular to minimise the complexity and size of your parser,

to ensure computational equivalence of endpoints use only regular, context-free protocols.

There are some interesting insights we can gain from this work, for example separating recognition from use can be seen as an instance of the economy of mechanism principle, as is avoiding the complex, non-deterministic undecidability of Turing complete languages. Likewise the stricture to only apply just enough computational power to handle an input can be seen as a logical extension to Saltzer’s and Schroeder’s least privilege principle. What also come out of this work on linguistic security is that language expressiveness, robustness (e.g. tolerating ambiguity in inputs) and extensibility are all characteristics that directly trade-off against security.

Art lives from constraints and dies from freedom

Leonardo da Vinci

For the system safety community this work throws up a number of fresh perspectives, the first is the concept of a latent software fault as the potential for a weird machine that will receive and process some input in an arbitrary, and unsafe, fashion. As Stuart Wray (2014) points out we might be wise therefore to constrain our programming to Turing incomplete domain specific languages. Likewise we should carefully consider what doors we’re opening when we introduce robust behaviour into the system and what might take advantage of that open door. Finally we might take a leaf out of the security book as to whether accidents should be considered as formal proofs by construction that your null hypothesis of safety is false.

Trackbacks and Pingbacks:

[…] to revise the core protocols of the Internet via the IETF RFC process to incorporate fundamental language security principles. We also need to move software engineering from an artisan skill based discipline to one that is […]

[…] I’ll be interested to see whether they put the finger on Postel’s robustness principle (RFC 793) as one of the root causes of our current internet security woes and the necessity to starve the Turing beast. […]

With a Bachelor’s in Mechanical Engineering and a Master’s in Systems Engineering, Matthew Squair is a principal consultant with Jacobs Australia. His professional practice is the assurance of safety, software and cyber-security, and he writes, teaches and consults on these subjects. He can be contacted at mattsquair@gmail.com