Several people forwarded me the thread on this mailing list about our 2,3
Turing machine prize.
I'm glad to see that it has stimulated discussion. Perhaps I can make a few
general remarks.
* What do we learn from simple universal Turing machines?
John McCarthy wrote:
> In the 1950s I thought that the smallest possible (symbol-state
> product) universal Turing machine would tell something about the
> nature of computation. Unfortunately, it didn't.
I suspect that what was imagined at that time was that by finding the
smallest universal machines one would discover some "spark of
computation"---some critical ingredient in the rules necessary to make
universal computation possible. (At the time, it probably also still seemed
that there might be a "spark of life" or a "spark of intelligence" that
could be found in systems.)
I remember that when I first heard about universality in the Game of Life in
the early 1970s, I didn't think it particularly significant; it seemed like
just a clever hack.
But a few decades later---after building up the whole framework in A New
Kind of Science (http://www.wolframscience.com), I have a quite different view.
I now think that it's extremely significant just how widespread---or
not---computational ability is.
Are typical systems that we encounter in nature universal? Or are they
computationally much simpler?
Can we expect to find non-trivial computational behavior just by searching a
space of possible systems? Or can we only reach such behavior by
specifically setting it up, say using traditional engineering methods?
I think these kinds of questions are very important for the future of both
science and technology. In science, for example, they tell us to what extent
we can expect to find "computationally reduced" models for systems we study.
And in technology, they tell us to what extent we can expect to find
interesting practical algorithms and other things just by searching the
"computational universe".
I don't think that the details of which particular Turing machine is or is
not universal are all that significant. What is significant, though, is how
widespread universality is in general. And currently the only way we have
of establishing that is to do the hard work of proving the universality or
non-universality of particular Turing machines.
Our standard intuition tends to be that systems with simple rules must
behave in simple ways. But what we find empirically by studying the
computational universe is that that just isn't true.
So now the problem is to get a scientific understanding of this, and to make
precise statements about it.
One thing that's come out of my efforts in this direction is what I call the
Principle of Computational Equivalence. It's a fairly bold hypothesis about
which there's much to say (e.g.
http://www.wolframscience.com/nksonline/chapter-12 ).
But one of its predictions is that universality should be possible even in
systems with very simple rules. I've been keen to test that prediction.
And looking for small universal Turing machines is a good way to do it.
There are other reasons to be interested in small universal Turing machines,
too.
Perhaps one can even use them as raw material for creating useful computers
out of components like molecules. Alex Smith's encoding for the 2,3 Turing
machine clearly isn't going to be practical (and wasn't intended to be).
But one of the lessons of modern technology is that once one knows something
is in principle possible, it tends to be just a matter of time before it can
be reduced to practice.
There are also theoretical computer science reasons to be interested in
small universal machines. An example is understanding the tradeoffs between
algorithmic complexity and computational complexity.
(For example, I made some progress in empirical studies of traditional
computational complexity questions by looking at small Turing machines in
http://www.wolframscience.com/nksonline/section-12.8 )
But most of all, what I think is important about finding small universal
Turing machines is that it helps build our intuition about what kinds of
systems are universal---so that when we encounter systems in computer
science, mathematics, natural science, or whatever, we have a better chance
of knowing whether or not they're likely to be universal (and thus for
example to show undecidability).
* How should one set up the definition of universality?
As for any concept, one wants a definition that's useful and robust, and
that one can build on.
Obviously the initial conditions can't need a universal computer to set them
up. But just what can they contain?
I think the "obvious answer" has changed since the 1950s. In the 1950s
(which were firmly before my time), my impression is that one imagined that
a computer memory---or a Turing machine tape---would somehow always start
with 0s in every one of its cells. (Actually, even then, I suspect that
when one powered up a computer, there could be random data in its memory,
which would need to be explicitly cleared.)
Now, particularly when we look at the correspondence with systems in nature,
it doesn't seem so especially natural to have an infinite sequence
specifically of 0s. Why not have 1s? Or 01 blocks? (Like one has digit
sequences of integers vs. rationals, etc.)
I must say that I would consider it most elegant to have universality
established with an initial condition that is basically just repetitive.
But what is so special about repetitiveness? There are lots of other
patterns one can imagine. And it seems as if the only robust distinction
one can make is whether the pattern itself requires universal computation to
set up.
There is no doubt some nice theoretical computer science that can be done in
tightening up all these notions.
Perhaps there is really a whole hierarchy of "levels of universality"---with
systems requiring different levels of computation in the preparation of
their inputs to achieve universality. (One might imagine block-universal
vs. finite-automaton-universal vs. context-free-universal, etc.)
My own intuition is that while there may be boundary cases where
universality is possible with some forms of input but not others, this will
be rare---and that most of the time a system will either be "fully
universal" or not universal at all.
There may well be cases where special forms of input prevent universality.
It might even be that infinite sequences of 0s prevent universality in the
"prize" 2,3 Turing machine (which has rather special behavior in a field of
0s). But I'm guessing that if one considers more robust classes of
encodings, it will usually matter very little which class one is using.
I'm guessing that the situation is similar to intermediate degrees. That
there are in principle systems that show undecidability but not
universality, but that this is extremely rare.
It would, of course, be quite wonderful to have a simple Turing machine or
other system that fundamentally shows undecidability but not universality.
Though I wonder slightly whether this is in any robust sense possible.
After all, if one looks at the existing examples of intermediate degrees
(e.g. http://www.wolframscience.com/nksonline/page-1130c-text ), they always
seem to have "universality inside".
OK, so what about the 2,3 "prize" Turing machine? Alex Smith's proof is
about universality with initial conditions created by fairly complicated
(but non-universal) computations. Can it be extended to simpler initial
conditions? I certainly expect the initial conditions can be considerably
simpler.
Can they be a finite region embedded in an infinite sea of 0s? Obviously I
don't know. Though it would be interesting to find out.
I might mention, by the way, that there are other 2,3 (and 3,2) Turing
machines that I suspect are universal too. And some of them definitely have
quite different behaviors with respect to infinite sequences of 0s.
* Will the encoding always be more complex if the machine is simpler?
Our standard intuition always tends to be that the more complex a thing we
want to get out, the more complex a thing we have to put in.
But one of the big points of A New Kind of Science (encapsulated, for
example, in the Principle of Computational Equivalence) is that this isn't
generally true.
Alex Smith's encoding for the 2,3 Turing machine is complicated and
inefficient. But is this inevitable when one has a simple underlying system?
I don't think so. But so far I don't really have strong evidence.
Here's one slightly related observation, though.
One of things I did in A New Kind of Science (and thought was rather nice)
was finding the very simplest possible (equational) axiom system for Boolean
algebra. It turns out to be just one axiom: ((b.c).a).(b.(b.a).b))==a.
(See http://www.wolframscience.com/nksonline/section-12.9 )
Now the question is: will proofs that are based on this tiny axiom system
inevitably be systematically longer than ones based on bigger ("engineered")
axiom systems? (Do they for example always first have to prove the axioms
of the bigger axiom systems as theorems, and then go on from there?)
Well, what I found empirically is that bigger axioms aren't necessarily more
efficient. (This may not be a robust result; it could depend on details of
theorem-proving algorithms.)
In fact, as http://www.wolframscience.com/nksonline/page-1175d-text shows,
even the single-axiom axiom system isn't terribly inefficient---at least
after it proves commutativity. And the minimal axiom system I found that
explicitly includes commutativity---{(a.b).(a.(b.c))==a, a.b==b.a}---seems
to be pretty much as efficient as anything.
This isn't directly relevant to encodings for universality, but perhaps it
gives some indications.
Another way I tried to get evidence about encodings was to see how far I
could get in covering a space of possible finite cellular automaton
computations by using encodings based on blocks of different lengths (
http://www.wolframscience.com/nksonline/section-11.11 ).
It was encouraging that cellular automata I didn't think were universal
didn't manage to cover much, but ones I did think were universal did.
I looked a little at whether more complicated rules would allow smaller sets
of blocks ("smaller encodings") to cover a given region of cellular
automaton computation space. And didn't find any evidence of it.
I'm not quite sure in general how to formulate the problem of complexity of
encodings.
After all, there are inevitably computations that are arbitrarily difficult
to encode for a given system.
But somehow the question is whether "reasonable" or "interesting"
computations can be encoded in a short way in a given system.
It's a little like asking what it takes for a programming language to let
you write the programs you want in a short way.
For a language designer like me, this is an important practical question.
And I'd love to have a better theoretical formulation of it.
(Our "open code" Demonstrations Project at http://demonstrations.wolfram.com
has some pretty interesting examples of what can and occasionally can't be
done with short Mathematica programs...)
* What happens after the prize, etc.?
I've obviously spent a huge amount of effort pursuing my interests in NKS
and writing up what I've done. As well as being fascinating and fun for me,
I think it's important stuff---and I've been keen to get other people
involved.
We've had a lot of success with things like our Summer School
(http://www.wolframscience.com/summerschool/2007/). But with the prize we
wanted to try stimulating the field in a different way.
And so far, the experiment of the prize seems very successful.
Is the prize the end of the road for this particular 2,3 Turing machine?
Definitely not. There is lots more to be studied and established about it.
Perhaps it'll be proved that it can't be universal in certain senses.
Probably there'll be much simpler proofs of universality found. Perhaps
someone will "reduce to number theory" or some such the detailed behavior
from a blank tape. Even though it has such simple rules, it's obviously a
rich system to study.
And having seen how this prize has gone, I'm now motivated to think about
setting up some other prizes...