I have read the NKS book about a year back and my immediate feeling was that the concepts outlined in the book probably will be a revolution in how science will be pursued in the future. The language of science has previously been equations but now it becomes increasingly clear that the language of science in the future will be computations.

I believe firmly in the concept of computational irreducability; that many systems in nature has properties such that no mathematical shortcut exists for predicting their behavior.

But there is one concept in the NKS book which I have not been able to come to terms with. That is the principle of computational equivalence. Wolfram says that most class 3 and 4 automatas are universal computers and that all universal computers can emulate one another. I believe this to be proven true in the future also.

But to draw the conclusion that all universal computers are equivalent is according to my intuition wrong. If the principle was true all "personal computers" are equivalent in complexity since they are universal computers and can emulate one another. A Commodore 64 computer from the 80-ies would then be equivalent to a modern Pentium 4 computer. But no one in his right mind would pay as much money at the computer store for a Commodore as for a Pentium 4 computer.

A Commodore 64 could in principle be programmed to emulate a Pentium 4 and a Pentium 4 could be programmed to emulate a Commodore 64. The difference between the two computers is that the time it would take to emulate a given program on the Commodore would be vastly longer than the time it would take for the Pentium 4 to actually execute the program itself. Conversly it is quite likely that the time it would take to emulate a given program on the Pentium 4 would be shorter than the time it would take for the Commodore to actually execute the program itself.

Hence there is something that is missing from the principle of computational equivalence.

Most of the big ideas in the NKS book depend on simple programs underlying much of what we see in nature, and on computational irreducibility - that certain of these rules typically do things we cannot short-cut or compress, but must instead watch, simulate, and experiment with.

PCE (the principle of computational equivalence) goes beyond that, clearly. The NKS book makes a pretty strong case for class 4s eventually proving universal. The threshold is driven low enough, and the general scheme of cross emulations is powerful enough, that it is implausible only a few class 4s are universal. It should not after all be too hard to get class 4s with more internal resources to emulate an elementary CA we already know is.

Class 3s are a tougher case. The book suggests they will prove universal as well. And it gives some reasons. Clearly there is enough going on in a class 3. But whether it can be arranged in arbitrary ways remains an open question. To prove any class 3 is universal is one of Wolfram's "open problems". And it would be a significant contribution to NKS if someone managed to do so - perhaps for a 3 or 4 color rule, to give oneself some additional wiggle room initially.

Why is PCE important, within the whole structure of argument in the NKS book? Because it establishes a kind of "bound". It tells us one way in which there isn't something beyond class 4, in complexity terms. That we aren't missing something essential, by using only simple programs. Also because it has immediate consequences for the predictibility and reducibility of a simple program.

From sufficiently involved initial conditions, any universal rule can impliment any finite algorithm. That means statements of the form, "but it can't do this", are almost always going to be demonstrably false. It also means statements of the form, "I can tell what it will do beforehand by this shortcut", will also almost always be false. If you can't rule things out, you can't predict. Certainly if you know the whole initial condition too, you can calculate what the rule will do. But you can't limit it without something like that much information.

With simple enough rules, you can know what they will do from the rule alone. But with universal ones, you must also know the initials. If rules of the latter kind are common in nature, this has large consequences for the predictability of natural systems, and particularly for attempts to predict them using the method of analysis to smallest parts and discovery of their micro-behaviors. With a universal system, the initials also matter. How the bits of the system are assembled, matters.

Within the argument of the NKS book, the role of universality is to connect what has been seen in simple programs to crucial results in the foundations of math, and to unify the main phenomena already seen. To wit, that the same forms of behavior are seen in system after system, even with quite different underlying rules. It is a strong way of showing "no, it isn't just about CAs". The classes of behavior Wolfram first saw in the output of elementary CAs from random initial conditions, are far more general than that.

It does not mean and was never meant to mean that you'd use an elementary CA instead of a chip from Intel. Although it certainly does suggest it may be possible to build even a practical computer out of unlikely components, and on vastly simpler scales - which might matter for biology of nervous systems, for example, or for nanotechnology.

Yes universal systems differ in terms of processing speed and memory. They also may have certain problems they are particularly suited for, in the same way some relationships look much simpler in a certain base or other representation, than they do in another. If one is looking at a natural system and trying to understand it, its behavior will depend on details of its rule and of the conditions that rule is acting upon.

But the single most important thing you need to know about that system is its class of behavior. Because your method will change, and your expectations will change, if you suspect that it is a universal system. You will not expect to find a single formula that describes all its possible trajectories. Breaking it down into its smallest parts and isolating their micro behavior may still yield some insights, but it will not tell you what the system is going to do. You can know its "physics" and still not know its "chemistry", let alone every form that might be assembled out of them.

These vague, intuitive points about intractability become something you expect. You reach immediately for the experimentalist's toolkit and to formal experiment in particular. You do not underestimate what you are dealing with. That the underlying components are simple and they interact in only a modest number of ways does not change this. You are prepared to see systems of which that is also true, do practically anything, depending on how they are started off.

You might very well still be able to characterize what the system will do from some class of initial conditions you have reason to think of as "typical". But you will also be prepared to see exceptions. And not to think of them as "statistically impossible", merely as intricate cases. Or the reverse. Remember that a class 4 can sometimes do simple things. That follows from its ability to emulate other arbitrary systems, some of which do simple things.

And all of these methodological consequences will follow, if it is true that many natural systems are behaving as simple programs, a portion of which are across the threshold of universality. What all universal systems are going to have in common is the method we must use to attack them. And that is why it is a foundational principle for a new kind of science.

It is a worthwhile exercise to read through chapter 12 again, and try to track which of the conclusions depend on full PCE and which rely only on widespread computational irreducibility. Many of them need only the latter. Some may need only partial versions of PCE - if all class 4s are universal, say. Some - especially claims with "every" or "almost all" in them - need the full principle.

It is still my feeling that something is missing from the principle of computational equivalence. Let me clarify what I find is missing.

One paragraph in your reply which I found particularly interesting started like this:

“Yes universal systems differ in terms of processing speed and memory. They also may have certain problems they are particularly suited for…”

This highlights two points which I find is missing from the principle of computational equivalence. First of all: To be able to compare two different universal computers you have to specify which processing task that should be the basis for comparison. Second of all: You have to let the two universal computers run this processing task and see how long time it takes to finish the task. The computer that finishes first is better than the other computer. Not equivalent.

To say that two automata are equivalent to one another just because they can emulate one another is not enough. This does not take into account that time is scarce. Every system in the universe has a limited span of existence. A human being for example has a life span of maximum 100 years. This is why processing time matters. You do not want to wait until you die before the processing task has been finished.

A third point which is missing from the principle of computational equivalence is that material resources are scarce. What I mean by this is that to compare two universal computers you also have to take into account the “cost” of the two different systems. Suppose that two universal computers finish a given task in the same amount of time. Then that computer that claims the least resources is better than the other computer. Not equivalent.

It is rather intuitive really. When you choose a processor to solve a problem you have to choose the processor that is best suited for the task. You also have to choose the processor which is “cheaper” if two processors otherwise are equal.

To say that all systems that achieve universality are equivalent is therefore not correct according to my intuition.

Of course systems that are computationally equivalent in PCE terms can have other differences. Equivalence as a concept always refers to some aspects. When we say that one pentium chip is like another, we do not mean they occupy the same space coordinates. Computational equivalence of two systems does not mean they have the same economic value. It means the space of possible computations each can be programmed to eventually reach is the same.

Not all systems are computationally equivalent in this sense. Rule 250 can't be programmed to do anything one wants. There isn't enough going on. You can fit a section of an arbitrary signal with a fourier expansion, but not with one sin wave, or a single constant. It is meaningful to say universality is a threshold because there are all sorts of things it is possible to do, and others it is impossible to do, below it. That change when one crosses that threshold.

Perhaps the difficulty is intuition from engineering computer science rather than from theoretical science and modeling. Suppose I set you the task, predict the behavior of a universal system following an unknown rule. Is your task made trivial by the fact that the number of elements in the system is "only" 64,000? (Anybody who thinks so, write a master level Go program and collect prizes. Large possibility spaces are inherently intractable to span).

But the number of elements in the system can be 6.022x10^23, and the result predictable enough for all practical purposes - if the behavior those elements follow is simple enough in form. The practical question is, what internal computational variety and interconnection does the system contain? If there is more than a smidgen, across the universality threshold, then any significant size will make the system computational intractable. (Not unattackable, but requiring a definite approach - one in which moderately higher speed won't help much at all). If there is practically none so the resulting behavior is simple, then size won't matter, reduction will be possible, and even a slide rule will give decent results. It is not the size of the system or the processing power thrown at it that matters for modeling. It is whether the system behaves in an inherently intricate way, or not.

For the software engineering types, you have to see the "inverse problem". Imagine I have an old computer with 64k memory. It will start in some unknown state. It runs for 1 minute, ending in some other state. Your mission is to reproduce that exact configuration history at any time in any portion of your memory, no matter how many other things you do that are wrong elsewhere or in the meantime. You don't get a pentium and one minute, or five. We are sporting about it. You get a teraflop supercomputer and for running time, the lifetime of the universe. But, it is a universal computer, with an unknown rule, and you know absolutely nothing about its initial state. You therefore face the entire possibility space of a dinky old computer for a single CPU minute. And you will lose. You can barely begin to enumerate possible states because they explode exponentially. Now, second task, my wimpy old computer has to calculate the center cell of the 64 trillioneth step of a 1D CA from a simple initial condition, which as the smiling fates would have it, happens to be a simple one - say rule 250 (checkboard). No problem at all. The step number is even, all I need to know.

Computational equivalence is not relative to a task, it is a statement about a space of tasks. It is saying the task is what matters, and saying it in a particular way. (It is also about how much "software" can do, without requiring special components).

Any system following definite rules can be modeled or reconstructed with enough detailed information. About initials as well as the form of the rule, or the behavior of sub-elements, "atomic" interactions. Simple enough behaviors have the additional property that knowledge of the initials is not really required - or not in detail, or only as a parameter to which the behavior responds in a regular and reducible way.

But a system capable of universal computation is not going to have this property. Make a prediction about its behavior, without knowing the details of its initial conditions. Now I set those details, programming the system to behave contrary to your prediction. Think I can't play that game if I only have an old desktop computer? Sure I can. To limit possible behaviors, you need not only the rule but at a bare minimum constraints on the initials (if not exhaustive knowledge of them).

Is any of this only about CAs? It is not. Are the classes of behavior seen in CAs limited to that special type of system? Not at all.

The principle is about the method of modeling natural systems with simple programs, and accounting for the variety and complexity we see in natural systems with those models. It is about seeing natural systems as software, not just as special components that supposedly must do x or y.

What do you do with a natural system you want to model that you see is universal? You model it with a universal system, not a simple one. The model can behave in all sorts of complicated ways. Good - so can the real system. But the real system does some definite thing. Fine, get very empirical with it (to constrain initials, to learn the form of the rule, etc), and also try lots of guesses as to its rule and initials. Expect details to be off, and only the class of behavior seen to be right, at least at first.

Then we have the objection that "both computations are ones and zeros being manipulated by a computer using electricity through electrical circuitry and software". This is supposed to prove what? It shows that simple behaviors and complicated behaviors are both behaviors. What isn't? Where the first objector saw the distinction being made but pretends it can't be meaningful because he can imagine another distinction, this one pretends not to see the distinction simply by not making it.

If a system is behaving according to rule 250 I can predict the millioneth step in one move. If it is behaving according to rule 30 I can't. No quibble can erase that distinction.

The difference between them is not that one is a computation and the other isn't; it isn't that one is represented by 1s and 0s and the other by red green and blue, its isn't that they are manipulated by computer rather than by rocks laid out on the ground; it isn't that electricity is used or mechanics; it isn't how fast a commodore 64 determines that one million is even; it isn't that one is an internal mental process and the other an external physical one; it isn't that one is discrete and the other continuous; or one deterministic and the other an objectively possible quantum process. None of the pet important distinctions posited by others from early philosophy to recent science to previous posts in this thread, have anything to do with why and how one of them is completely different from the other, with respect to knowledge.

One of them is simple and reducible. And the other isn't. Without differing in any of the "usual suspect" ways. Complexity is real, a formal phenomenon, exists on the level of software not components, and has general laws. Deliberately missing the critical distinction or pretending subsequent ones have anything like the same importance for whether we can know something (and if so, how, what methods to try) does not change this.

PCE delimits the class in the second category, the complex rather than simple, and states a reason why all complex systems are functionally and practically alike in their intractability - to our perception and to the most elaborate analysis. It is a diagnosis of what is underlies and unifies complex systems - as such and as distinguished from all that is simple on the other hand - and makes them hard to decipher - computational universality. It is the thesis that the cause of formal intractability and apparent, subjective, phenomenal complexity is real, underlying, hard, computational universality. Things seem complicated to us when and only when they could be programmed to do anything.

That is the thesis. It might be wrong - in my mind, something said in its favor. In the sense that it is sticking its neck out and making a real claim about the world. But if it is wrong it won't be for these missed distinction sort of reasons. Maybe class 3s aren't universal, and universality is a proper subset of real complexity - perhaps even a relatively special, rare subset. Maybe about irreducibility Wolfram is right but about universality he isn't. Maybe we won't be able to tell, because it will be too hard to establish or disprove universality for some classes of complex looking systems. But it is a perfectly reasonably hypothesis with a lot to be said for it. And we will learn a lot about complexity proving it wrong, or finding it hard to prove or disprove, or finding it right - whichever.

Clearly if we restrict our definition of "sophistication" to exactly which problems can ever be solved, then any universal machines are equivalent. But if, as Wolfram asserts, the threshold for universality is so low, then this doesn't really seem like a very useful definition of sophistication; it would lead us to call any system either "sophisticated" or "not sophisticated", according to whether it is universal or not. To me, a useful notion of "sophistication" requires a gradient or hierarchy thereof.

I think this raises a question that has bugged me for a while: why does Wolfram say "computations of equivalent sophistication" instead of "universal computations"?

I presume that he chose his words carefully in order to be faithful to some subtle difference. But I don't think he explained what that difference is.

It isn't up to you or to Wolfram whether the onset of complexity is gradual or a threshold effect. It is a formal and empirical question, not something to be settled by definitions.

Wolfram is claiming it is a threshold, but that might be investigated and found not to be the case. E.g. Reducibility might extend far beyond apparent simplicity. Irreducibility without universality might extend for another broad band. It is a real claim about the world, not something that necessarily follows from a choice of terms and their signification. Wolfram is claiming that where apparent simplicity gives out, there are a small number of additional hard reductions here and there, then one uniform block of systems already capable of supporting universality. That isn't a choice of terms, it is a prediction or conjecture about as yet undiscovered formal results.

As for why Wolfram carefully avoids "universal" in the PCE context, it is obviously because the original notion of universality is mathematically well defined for infinite cases, not for strictly finite ones. A pentium chip is constructed to be a universal computer, but no actual computer using one is strictly universal. Because all of them have finite memory and have run for finite times. This does not prevent us from treating them as universal computers for all practical purposes. But we typically don't set them problems like "enumerate all the primes", because we know they will never finish.

PCE says that all but a tiny fraction of cases where we see apparent complexity, the underlying cause is that the system we are looking at is sophisticated enough to support universal computation, if given proper initial conditions, size, run long enough, etc. That it is in every way analogous to the computational flexibility we see in practical computers. They are all over the flexibility threshold. None of them are over the cardinality threshold (truly infinite memory etc).

Thanks for the last two posts Jason. They were illuminating. Especially the last paragraph summed it up elegantly for me.

"PCE says that all but a tiny fraction of cases where we see apparent complexity, the underlying cause is that the system we are looking at is sophisticated enough to support universal computation, if given proper initial conditions, size, run long enough, etc. That it is in every way analogous to the computational flexibility we see in practical computers. They are all over the flexibility threshold. None of them are over the cardinality threshold (truly infinite memory etc)."

Then my understanding of PCE is that all system that are complex, are equal in the regard that they are “universal” computers. Since nature is full of complex systems PCE states that “universal” computers are common place. If that is all the PCE hypothesis states then I have no more objections. It is a nice hypothesis. Let us just wait and see if “universal” computers are shown to be common place or not.

I found the chapter about PCE the most difficult to understand in Wolframs book. Now at last I think I understand what the message was.

Even if a Pentium 4 computer is faster than a Commodore 64 they share the same basic blueprint. This is then what makes them equal according to PCE. It is like comparing a small propeller aeroplane with a Jumbo jet. They are equal in respect to the fact that they both share the same basic design that the Wright brothers developed. The Jumbo Jet might be bigger and faster than the small plane, but they are still both aeroplanes.

Yes that is basically it. It deserves the quotation marks because any given complex example might need to be given extended resources - the analog of large memory or system size - for mathematically rigorous universality to make sense. But the same it true of practical computers.

Also, the almost all bit is meant to allow a small number of cases I call "cracks" (as in a cracked code), in which something looks complicated to us but detailed analysis can find a formula that unlocks the structure in it and predicts all the behavior. PCE includes the claim that this is only going to be possible for limits pockets, basically just on the near side of the threshold of universality. While most of possibility space (CAs with an arbitrary number of colors and range of interaction etc) will be populated almost entirely by systems capable of supporting universality.

But other than those quibbles - which are the reasons for the qualified language one sees in parts of chapter 12 - yes you have the basic idea. Things that look complex are all "airplanes" - programmable. That is what they have in common, and why they all properly fall under the same concept.

Then we have the objection that "both computations are ones and zeros being manipulated by a computer using electricity through electrical circuitry and software". This is supposed to prove what?

[END OF QUOTE]

It is supposed to _disprove_ the modified PCE.

The mPCE implies a fundamental distinction between the computations corresponding to behavior that is "obviously simple" and the computations corresponding to behavior that is not "obviously simple."

But there is no such fundamental distinction (at least in the case of cellular automata as presented by Dr. Wolfram in _A NKS_).

Both computations [i.e., those which produce Class 4 CA and those which produce Class 1 CA] are ones and zeros being manipulated by a computer using electricity through electrical circuitry and software.

Furthermore, I will defend disproving the mPCE as opposed to the PCE.

The PCE says that any process that is not obviously simple "can be viewed as" a computation of equivalent sophistication.

Well, 2 + 2 "can be viewed as" 22; no one can debate that: the words "can be viewed as" exempt the PCE from logical verification.

Which is why I modified the PCE to read, "Any process that is not obviously simple _is_ a computation of equivalent sophistication." That we can argue.

Thank you, Jason, for your reply. It is well-received and interesting.