In 1937 Turing described a Turing machine. Since then many models of computation have been decribed in attempt to find a model which is like a real computer but still simple enough to design and analyse algorithms.

As a result, we have dozen of algorithms for, e.g., SORT-problem for different models of computation. Unfortunately, we even cannot be sure that an implementation of an algorithm with running time $O(n)$ in a word RAM with bit-vector operations allowed will run faster than an an implementation of an algorithm with running time $O(n \cdot \log{n})$ in a word RAM (I am talking about "good" implementations only, of course).

So, I want to understand which of existing models is "the best" for designing algorithms and I am looking for an up-to-date and detailed survey on models of computation, which gives pros and cons of models and their closeness to reality.

Well, I don't think that cache-oblivious model is the best one. Even for simple problems describing an algorithm in this model can be quite difficult. Thanx for a link to cstheory, maybe I'll try it later.
–
Tatiana StarikovskayaNov 2 '10 at 15:36

5 Answers
5

While it is the case that many models of computation agree on which functions $\mathbb{N} \to \mathbb{N}$ are computable, I would like to point out that this is not the case when we think of higher-order functions. (I am making this remark not to answer the question but to supplement the existing answers.)

For example, in Gödel's T (simply typed $\lambda$-calculus with booleans, natural numbers and primitive recursion) there is no universal quantifier $all : (\mathbb{N} \to 2) \to 2$, i.e., a map such that $$all(f) = \begin{cases}1 & \text{if $\forall n . f(n) = 1$} \\\\ 0 & \text{otherwise.}\end{cases}$$
But we can write such a quantifier in PCF (simply typed $\lambda$-calculus with booleans, natural numbers and general recursion). Once we have a candidate program $all$, we still have to worry whether it works. The answer again depends on the model of computation.

If we use as the underlying model Kleene's number realizability, i.e., Turing machines which accept and output finite strings of bits, then $all$ does not work because of the Kleene tree. If we use as the underlying model Kleene's function realizability, i.e., Turing machines which accept as input and output infinite strings of bits, including non-computable ones, then $all$ works.

As a second example, let me mention (exact) real number computation. There are two ways to models reals:

Intensionally as a datatype $R_I$ of Cauchy sequences in which each real is represented by (fast) Cauchy sequences of rationals converging to it. In particular, a program may inspect the representation of a real.

Extensionally as an abstract datatype $R_E$ of real numbers where we cannot inspect the representation of the reals. An example of such a language is RealPCF.

It has been known that $R_I$ and $R_E$ represent the same reals, that $R_I^{R_I}$ and $R_E^{R_E}$ represent the same maps, and that $R_I^{R_I^{R_I}}$ and $R_E^{R_E^{R_E}}$ represent the same rank 2 functionals. But recently Matthias Schröder proved that at the next level we have a disagreement between $R_I^{R_I^{R_I^{R_I}}}$ and $R_E^{R_E^{R_E^{R_E}}}$!

Higher-type computation can be quite intriguing, and there it's definitely not the case that "all models of computation are equivalent".

Good point. I was not stating that "all models of computation are equivalent" in my answer to this question. I was stating that the specific models of computation which I'd listed were proven to be equivalent: that a Turing machine with two tapes or $n$-tapes is equivalent to (has the same computational power as,can be simulated by) a single-tape Turing machine; that the concept of a regular language is equivalent to a probabilistic finite state machine which accepts (stops on a "halting state") that regular language. I explicitly state that hardware must be considered.
–
sleepless in beantownNov 3 '10 at 6:52

Yes, yes, of course, I think we are in perfect understanding here. I am just pointing out that things get interesting at higher types.
–
Andrej BauerNov 3 '10 at 10:04

Also, the various notions of infinitary computability, such as infinite time Turing machines, BSS machines, and higher recursion theory, are known not to be equivalent.
–
Joel David HamkinsNov 3 '10 at 10:52

@Joel: quite right, however the ones you mentioned are beyond what is effectively computable, whereas I am talking about models that do not exceed the power of Turing machines. They just differ in how infinite and higher-order data are presented to Turing machines.
–
Andrej BauerNov 3 '10 at 16:41

Since you've asked the question here at MathOverflow rather
than a CS theory site, let me try to give the perspective
from computability
theory
rather than computational complexity
theory.
Thus, I give in a sense a math answer rather than a CS
answer, although I realize that this is not the answer you
seek.

From the perspective of computability theory, the most
important fact about all the dozens or hundreds of
varieties of computational models is precisely the fact
that they are all equivalent. There is no "best" model.

It really is quite remarkable that all the models of
computation that have been proposed give rise to exactly
the same class of computable functions and decidable sets.

The fact that all the proposed models of computability are
equivalent in this way indicates that this concept of
computability is a highly robust mathematical idea. Indeed,
the equivalence of the models is usually taken as strong or
even decisive evidence for the Church-Turing
thesis,
the philosophical claim that any of these definitions of
computability captures the notion of what is
computable-in-principle.

It is easy to imagine, after all, that things might have
turned out differently, and that there would be a hierarchy
of computability, where having a stronger machine model
would allow you to decide more sets and to compute a larger
class of functions. But instead, we have a low-level
threshold phenomenon, where once you attain a certain very
primitive power of computability, then all the models can
simulate all the other models.

Thus, from this computability theory point of view, there
is no "best" model, and it doesn't matter at all which
model you use. The purpose of the models in computability
theory is not to design computers or to design algorithms,
but to help us understand the power of computability and
especially its limitations. Most computability theorists do
not rely on a single model of computability, and prefer to
fall back on abstract definability characterizations, which
center on the idea of unbounded search, at the essence of
computability.

I am reminded of conversations I've often had with
students, who upon seeing the Turing machine model want to
extend it by adding extra power to the machines, allowing
the machine to do in one step what used to take several or
augmenting the machine with registers and so on, in order
to make a "better" Turing machine. Such efforts are
completely pointless, because the purpose of the Turing
machine model is not to program with it, but rather to have
a theoretical model that is simple, yet fully powerful. We
want a weak-seeming model, because we want to use the model
to show that things are not computable, rather than that
they are.

But I realize that this is probably not your perspective.
It is sometimes said that the difference
between computability theory and computational complexity
theory is that the computability theorist is fundamentally
interested in studying the non-computable, the hierarchy
of Turing degrees, while the complexity theorist studies
what is computable.

The equivalence between the models extends deeply down into
complexity theory, in the sense that to my knowledge, all
of the standard models of computability offer polynomial
time simulation in each other. That is, any model can
simulate any other model within a polynomial time factor.

Thus, the differences between the models arise only when
one cares about the particular polynomial, as you indicate
you do in your question. And this is a concern that takes
one out of computability theory and into computational
complexity.

Thank you for your answer. Of course, I know about Church-Turing thesis and equivalence of reasonable models of computation. But for real-life problems it makes a huge difference whether the problem is solvable in O(n) or in O(n^2) time and sometimes even a constant means a lot.
–
Tatiana StarikovskayaNov 3 '10 at 10:49

Oh, I agree completely with that, and I hope you don't find my answer polemical. Of course, it is the real-life actual quickness of algorithms on actual machines is what has led to the amazing actual power of computers in our lives. So I am supportive of your work!
–
Joel David HamkinsNov 3 '10 at 11:00

1

I vote this answer down because as the answerer acknowledges, it is not the answer to the question which was asked. I think the question should not have been answered here at all, but rather on TCS or stackoverflow forum.
–
Boris BukhNov 3 '10 at 12:47

2

I take myself to have answered the question that was asked (particularly the title question), but not to have given the answer the OP desired.
–
Joel David HamkinsNov 3 '10 at 13:53

I agree with Joel David Hamkin's answer and with Mark Sapir's answer. The point of computability models, such as Turing machines, regular languages, push-down automata with stacks, etc., is to show the equivalence of these models. THe reason for the big-O notation order of complexity is to show that, within a small additive constant, complexity can be defined in terms of a linear, polynomial, or exponential (or otherish) relation to a particular characteristic factor of the input (usually the size of the input). The only thing that changes for that computation on different systems is a multiplicative factor, or the additive constant.

Kolmogorov talks about quantitating this type of complexity based on abstract state machines, and coming up with a minimal-descriptor length type of explanation of complexity, and prefix-complexity.

Complexity can refer to

space-complexity, how much memory (RAM, lengths and numbers of tapes of Turing machines, how much space on a 2-d grid for a 2-d turing machine, how much of the stacks for push-down stack automata) is required to perform the calculation, as a function of the size or characteristic of the input

time-complexity, how much time (number of computation steps, number of movements of the read-head + number of movements of the write-head, number of clicks of the clock = crystal-oscillator for the CPU circuits for synchronous logic circuitry)

logic complexity (~? algorithmic complexity ?) - how many gates (AND, NOT, OR, XOR, NOR, flip-flops, latches, multiplexors, demultiplexors, line traces) are required to build the circuit (e.g. how can you implement a shift-bit-adder with silicon logic using only NPN transistors or only TTL logic), how many transistors, how many relays and wires, how much "surface area" for a 2-d instantiation of a circuit on a LSI / VLSI large-scale-integration very-large-scale-integrated circuit chip, how much "volume" for a 3-d multi-layer VLSI circuit chip, how many traces of the FPGA (field programmable gate array) are required to implement the algorithm in hardware, i.e. how large is the program to describe/implement the algorithm

However, the computer science courses that talk about computability theory are talking about these concepts abstractly, even when they talk about it for a particular algorithm, or even for a particular circuit (like a bit-adder with carry-over shifting) in electrical engineering circuit design classes.

The best model depends upon the hardware being considered.

A model for data-flow in the Connection Machine with a 10-dimensional hyper-cube network architecture for its 1024 processors and single-bit processing is not going to be an adequate model for data-flow in a TCP/IP 10-gigabit ethernet network connected Beowulf cluster with 1024 nodes consisting of hex-core 64-bit processors, or for a Tilera cluster of 16 chips with 64 cores per chip with multiple orders of communication latency difference between intra-chip vs. inter-chip communication time delays.

Or a model for SIMD (single instruction multiple data) type of parallelization is not going to be reasonable for a model of GPU (graphical processing unit) computation using the GPU as a co-processor.

It is not reasonable to speak of a "best model" without specifying the hardware, or at least the architecture, and the limits of the amount of RAM available (at the different levels of use and speed, level I cache, level II cache, ..., RAM dedicated to one CPU, RAM shared between multiple CPUs, RAM that will be swapped out to disk/hard drive, external NAS=network attached storage), and the algorithms and processes which are to be optimized and analyzed for complexity.

+1. This is absolutely the right answer to the question as asked.
–
Timothy ChowNov 3 '10 at 13:58

+1. I agree with this answer. But one quibble: you say that the models agree within a multiplicative factor, but I don't think that's true. Simulating a multi-tape Turing machine with separate heads on a single tape machine can involve a lot of back-and-forth (to simulate the action of separate heads), and this imposes a polynomial time cost, rather than merely a multiplicative one.
–
Joel David HamkinsNov 3 '10 at 15:08

@Joel-David-Hamkins, I actually said that big-O order of complexity notation gives us a linear, polynomial, or exponential relation, as the multiplicative factor effectively becomes nil relative to the rate of increase of the polynomial or exponential relation. The implementation of the algorithm plays a mighty role in the order of complexity; so an algorithm to sort data on a multi-tape Turing machine can be simpler than the algorithm to sort data in place on a single-tape Turing machine, but the overall complexity remains the same. It's only if you look at space, time, or algorithmic (cont.)
–
sleepless in beantownNov 4 '10 at 0:27

(cont.2) look at space, time, or algorithmic independent of each other, then it's easy to miss the inter-relationship of these three different complexity components. However, the polynomial order of the time-steps and of the head-movements in the single-tape Turing machine is obviated in the fewer time-steps and head movement of the multi-tape approach, while space complexity can go up. But the increase in time for single-tape simulating the multi-tape includes the complexity of the simulation. Kolmogorov's prefix complexity takes care of that by effectively including the (->cont3)
–
sleepless in beantownNov 4 '10 at 0:50

(cont.3) description of the machine and/or algorithm encoding as a prefix to the algorithmic code. It's been more than 2 decades since my last full reading of Papadimitriou's and Lewis' Elements of the Theory of Computation, so I may have stumbled on a few details, and for that I apologize. I'll type in a correction if needed after I look at some of the original sources. My point in this comment is that the complexity of simulation also incurs the complexity of the "higher level" abstract device such as multiple tapes and/or multiple heads.
–
sleepless in beantownNov 4 '10 at 1:01

Kolmogorov, apparently, asked the same question. Here is a paper, where he tried to answer it (with Uspenskii): Kolmogorov, A. N.; Uspenskiĭ, V. A.
On the definition of an algorithm. (Russian)
Uspehi Mat. Nauk 13 1958 no. 4(82), 3–28. I do not know if there were followup articles, but the idea of Yu. Gurevich' "Abstract state machines" is somewhat similar. Both Kolmogorov and Gurevich tried to "simulate" actual real life algorithms. Here is one of the (many) papers on abstract state machines: Blass, Andreas; Gurevich, Yuri
Abstract state machines capture parallel algorithms: correction and extension. ACM Trans. Comput. Log. 9 (2008), no. 3, Art. 19, 32 pp. This is not the foundational paper on the subject, but it gives a definition and has references to more foundational papers. Also look at this Wiki article.

Thank you, I have never known about that paper by Kolmogorov and Uspenskii, shame on me. I have listened to Yuri Gurevich's talk for Mosocw mathematical society last year, but the model seemed quite complicated to me.
–
Tatiana StarikovskayaNov 3 '10 at 10:52

1

@Tatiana: the ASM are in fact very easy. Yuri explained it to me once. Of course it might be easy for me because I am an algebraist (as is Gurevich, and even graduated from the same University). The model seems to be very useful in practice. As Yuri told me, for example, the Vienna metro scheduler was built very fast using a compiler from ASM. I have used metro in Vienna. It works fine.
–
Mark SapirNov 3 '10 at 14:00

Sounds pretty convincing) Thank you for the links, I find them quite useful. In fact, I've been also thinking about Yuri Gurevich's model.
–
Tatiana StarikovskayaNov 3 '10 at 19:52

I suggest that "best" be changed to "projected". For development and prototyping, many systems are designed and simulated on machines with a small number of processors. With cloud computing here and swarm computing not so far away (and possibly quantum computing being realized once the physicists solve the hard problems), your best model of today and of yesteryear may soon be superceded by an adequate model of tomorrow. Even just understanding massively parallel processing well is a worthy goal and likely to be applicable to the upcoming shifts in technology.

I think you are right and I must have used something different for the heading. Actually, I was asking about RAM model and different generalizations of it: what it is used today, why bit-vector operations are considered as trick operations by some scientists and by some not, and so on - just a survey if it exists.
–
Tatiana StarikovskayaNov 6 '10 at 17:16

There may be a survey of the kind you desire. It is not clear to me how idealized you want your models to be. If you are looking for practical applications, you might ask (in an industry-oriented forum) what is being built or used, and what merits one has over the other. If you are actually looking ahead, you might consider the notion of "extensible templates", which at heart is high-level application design which defers implementation issues (e.g. serial vs. parallel vs. massive parallel) even more than traditional design. ...
–
Gerhard PasemanNov 6 '10 at 17:39

Such a survey would aid you in dividing the work to the model best suited for doing that part of the work, or in making a hybrid model best suited for the application. Gerhard "Ask Me About System Design" Paseman, 2010.11.06
–
Gerhard PasemanNov 6 '10 at 17:39