Sunday, 5 March 2017

Turing Completeness Is Where You Least Expect It

The condition of Turing completeness is almost always explained in terms of the Turing machine. Naturally. A Turing machine is simply a hypothetical black box that performs a small set of mathematical/logical operations on a strip of paper tape that is of potentially unlimited length. The box zips back and forth along the tape reading and writing data into discrete cells, eventually at some point in the future returning the answer to some question. This is what it is to be computable at all—an assurance that the machine won't run forever. Any true computer is just a simulation of this primitive abstraction.

I've always had a hard time with the idea of the Turing machine. Part of the problem is that the Turing machine as described above is often described in terms of what it is—a box that does mathematical operations on cellular divisions of a strip of paper tape—and not what it means. And what the Turing machine means is that there is a condition of being able solve any computational problem given enough time and memory.

For something to be Turing complete, it must be able to solve every problem solvable by any other computer that exists or that could be imagined to exist (such as a Turing machine). It may be easier to see this from the other side: a Turing incomplete computer is unable to solve some known class of problems solvable by another computer. Usually, when we talk about Turing completeness, we're talking about computers-as-programming languages, which are systems that describe how to solve different problems.

Most programming languages you've heard of are Turing complete. There are no computational problems that can be solved by C and not by JavaScript or by Java and not Python or by Python and not Java. And so on. An interesting exception is SQL—the language implementing relational databases—which in its most basic form is generally understood to be Turing incomplete. This makes sense given that SQL doesn't actually exist for general-purpose computation and doesn't need features like loops and if-then conditional branching (which allows sections of code to be skipped under certain circumstances).

Likewise, HTML isn't Turing complete, nor should we expect it to be. Its usage is descriptive and not computational.

An interesting thing about Turing completeness is that it's not all that hard to achieve. In fact, it exists in the world all over the place just through accident. Bitcoin and darknet researcher Gwern Branwen has published a small catalog of "surprisingly turing-complete" constructs that lead to some curious and worrisome security implications.

"One might think that such universality as a system being smart enough to be able to run any program might be difficult or hard to achieve, but it turns out to be the opposite and it is difficult to write a useful system which does not immediately tip over into TC," Branwen writes. "It turns out that given even a little control over input into something which transforms input to output, one can typically leverage that control into full-blown TC. This can be amusing, useful (although usually not), harmful, or extremely insecure [and] a cracker's delight."

Some examples include Magic the Gathering, CSS, and common musical notation. In the case of Magic, this is only true assuming an endless sequence of cards that force player moves, thus eliminating player choice. The mechanics get pretty deep. CSS—cascading style sheets, which add information about the appearance of webpages—meanwhile, becomes Turing complete when we include user clicks into the system and, thus, the ability to change the system's state. Otherwise, CSS, as a descriptive language, mostly just sits there, like HTML. With some slight tweaks, common musical notation can be converted into the esoteric, Turing-complete programming language Choon.

A 2013 paper published by University of Cambridge computer scientist Stephen Dolan, and cited by Branwen, offers a contender for shortest computer science paper title ever: "mov Is Turing Complete." He's referring to an assembly language instruction—a hardware-level command, basically—that moves a unit of data from one memory address to another. It's one of a very long list of such assembly instructions, but what Dolan showed is that every other instruction can be reduced to this one action. It's wild.

The security implication is maybe not obvious. Some large part of computer security, generally, is evading and defending against malicious code that might gain entry into your system and do unwanted things. Doing so requires being able to identify such code as code, and not, say, Pokemon and or human heart cells. Computation could be anywhere.