Say we have an encoding of the set of all Turing machines/Turing programs -- WLOG, let's say the encoding takes values in the binary numerals. Call this set of binary numerals that represent Turing Machines TM.

TM will be decidable as a subset of all the binary numerals, B, and the ordering of B gets us an effective enumeration of TM. My question is about whether something stronger can be said about TM as a subset of B, not just that it's decidable. Can TM be a context-sensitive language (i.e. recognizable by a linear-bounded Turing machine) -- and if so, does it depend on the encoding?

Apologies in advance if the question is naive, or ill-posed: I'm clearly not an expert in computability theory. If you can help me make the question precise if it isn't, I'd appreciate it.

There's a chance you might get a better answer at cs.stackexchange. But if you decide to cross-post the question there, then you should leave a comment on both sites mentioning the cross-posting and linking to the other version.
–
Tara BMay 9 '12 at 22:06

@symplectomorphic: Let me know if something is unclear in my response. I wrote it from the perspective of strings and languages, though we can port this to binary numbers and sets of binary numbers if you want.
–
Lucas CookMay 10 '12 at 2:20

@Lucas -- no, your response was very clear & helpful; I upvoted it but didn't select it as the answer merely in the hopes that I might get further commentary. The more intuition I can develop, the better. But thanks for being so explicit that it really just depends on the encoding: that's what my course failed to emphasize, because we're working with a rather non-standard encoding onto a proper subset of a proper regular subset of binary numerals.
–
symplectomorphicMay 10 '12 at 4:26

2 Answers
2

Neglecting the binary encoding for the moment, a Turing Machine encoding can be quite simple, just a set of tuples of the transition function, that is, a subset of $Q \times \Gamma \times \Gamma \times \{L,R, N\} \times Q$ where $Q$ are the states, $\Gamma$ the tape alphabet and $L, R$ are move left and right opcodes, while $N$ does nothing, which allows state changes with nothing else happening. To keep it really simple, say that the first element of $Q$ is the start state and the second element the (only) final state.

We can say that any set of such tuples is a legal TM. Of course, most of them won't do much, but in fact, most significant questions about what such a set will do as a TM are undecidable.

But the bottom line is that your set TM could be the power set of ${Q \times \Gamma \times \Gamma \times \{L,R, N\} \times Q}$, each subset suitably encoded as a string to be regular, and further encoded in binary to remain regular. All an automaton need check is the basic syntax of tuples, with $Q$ and $\Gamma$ finite. This can be done with one pass at the input and no auxiliary storage, that is, by a finite state automaton.

Or, as Lucas Cook points out, the encoding can actually be to all strings over a given alphabet, even $\{0,1\}^*$, which is certainly regular.

On the other hand, we could make a fancy encoding in some procedural language, with declared variables and other bells and whistles, and make TM almost arbitrarily complex, though we certainly want to keep it recursive. Typically, real programming languages are formally context sensitive or pretty close, as are most actually useful languages.

All this shows that the syntax of Turing Machine encodings is not very interesting from the standpoint of theoretical hierarchies of formal languages. The fun begins with semantics -- what TM's do when run -- or when you try to find the shortest program for each partial recursive function in a given encoding, which leads to the wonderful world of Kolomogorov/Chaitin complexity theory.

We can enumerate the set of TMs over a fixed input alphabet, so we can actually provide a computable bijection to the set of all binary strings (which is regular). For example, if the input is the $i^{th}$ string in lexicographic order, we generate the $i^{th}$ TM.

If you want something more "readable", you can think of encoding a TM as a graph with extra information (e.g. as an adjacency matrix). This seems context sensitive to me, since the validity can be checked with only the input and a few counters to check the dimensions.

Even in the readable case, TM encodings will sometimes make the assumption that every string encodes a valid machine, where malformed inputs get mapped to some default case. This can be useful when considering languages (and their complements) that represent sets of TMs, like $\{\langle M \rangle | L(M) = \varnothing \}$ where $\langle M \rangle$ is the encoding of $M$.