Loops and infinity

Learn the model of NAND++ programs that add loops and arrays to
handle inputs of all lengths.

See some basic syntactic sugar and eauivalence of variants of NAND++
programs.

See equivalence between NAND++ programs and Turing Machines.

“We thus see that when \(n=1\), nine operation-cards are used; that
when \(n=2\), fourteen Operation-cards are used; and that when \(n>2\),
twenty-five operation-cards are used; but that no more are needed,
however great \(n\) may be; and not only this, but that these same
twenty-five cards suffice for the successive computation of all the
numbers”, Ada Augusta, countess of Lovelace, 1843Translation of “Sketch of the Analytical Engine” by L. F.
Menabrea, Note G.

“It is found in practice that (Turing machines) can do anything that
could be described as “rule of thumb” or “purely mechanical”…
(Indeed,) it is now agreed amongst logicians that “calculable by means
of (a Turing Machine)” is the correct accurate rendering of such
phrases.”, Alan Turing, 1948

The NAND programming language (or equivalently, the Boolean circuits
model) has one very significant drawback: a finite NAND program \(P\) can
only compute a finite function \(F\), and in particular the number of
inputs of \(F\) is always smaller than (twice) the number of lines of
\(P\).This conceptual point holds for any straightline programming
language, and is independent of the particular syntactical choices
we made for NAND. The particular ratio of “twice” is true for NAND
because input variables cannot be written to, and hence a NAND
program of \(s\) lines includes at most \(2s\) input variables. Coupled
with the fact that a NAND program can’t include X[ \(i\) ] if it
doesn’t include X[ \(j\) ] for \(j<i\), this implies that the length
of the input is at most \(2s\). Similarly, a Boolean circuit whose
gates correspond to two-input functions cannot have more inputs than
twice the number of gates.

This does not capture our intuitive notion of an algorithm as a single
recipe to compute a potentially infinite function. For example, the
standard elementary school multiplication algorithm is a single
algorithm that multiplies numbers of all lengths, but yet we cannot
express this algorithm as a single NAND program, but rather need a
different NAND program for every input length (see
multschoolfig).

Figure 1:Once you know how to multiply multi-digit numbers, you can do so for
every number \(n\) of digits, but if you had to describe multiplication
using NAND programs or Boolean circuits, you would need a different
program/circuit for every length \(n\) of the
input.

Let us consider the case of the simple parity or XOR function
\(XOR:\{0,1\}^* \rightarrow \{0,1\}\), where \(XOR(x)\) equals \(1\) iff the
number of \(1\)’s in \(x\) is odd. As simple as it is, the \(XOR\) function
cannot be computed by a NAND program. Rather, for every \(n\), we can
compute \(XOR_n\) (the restriction of \(XOR\) to \(\{0,1\}^n\)) using a
different NAND program. For example, here is the NAND program to compute
\(XOR_5\): (see also XOR5fig)

Figure 2:The circuit for computing the XOR of \(5\) bits. Note how it merely
repeats four times the circuit to compute the XOR of \(2\)
bits.

This is rather repetitive, and more importantly, does not capture the
fact that there is a single algorithm to compute the parity on all
inputs. Typical programming language use the notion of loops to
express such an algorithm, and so we might have wanted to use code such
as:

In this chapter we will show how we can extend the NAND programming
language so that it can capture these kinds of constructs. We will see
two ways to do so:

The NAND++ Programming language extends NAND with the notion of
loops and arrays to allow a finite program that can compute a
function with arbitrarily long inputs.

Turing machines are the classical way to give a finite description
of an algorithm for arbitrarily long inputs.

It turns out that these two models are equivalent, and in fact they
are equivalent to a great many other computational models including
programming languages you may be familiar with such as C, Java, Python,
Javascript, OCaml, and so on and so forth. This notion, known as Turing
equivalence or Turing completeness, will be discussed in
chapequivalentmodels. We start off by presenting NAND++ and
then show Turing machines, though it is also possible to present them in
the opposite orders.

The NAND++ Programming language

The NAND++ programming language aims to capture the notion of a single
uniform algorithm that can compute a function that takes inputs of
arbitrary lengths. To do so, we need to extend the NAND programming
language with two constructs:

Loops: NAND is a straightline programming language- a NAND
program of \(s\) lines takes exactly \(s\) steps of computation and
hence in particular cannot even touch more than \(3s\) variables.
Loops allow us to capture in a short program the instructions for
a computation that can take an arbitrary amount of time.

Arrays: A NAND program of \(s\) lines touches at most \(3s\)
variables. While we allow in NAND variables such as Foo[17] or
Bar[22], they are not true arrays, since the number inside the
brackets is a constant that is “hardwired” into the program. In
particular a NAND program of \(s\) lines cannot read an input X[ \(i\)
] for \(i>2s\).

Thus a good way to remember NAND++ is using the following informal
equation:

It turns out that adding loops and arrays is enough to not only enable
computing XOR, but in fact capture the full power of all programming
languages! Hence we could replace “NAND++” with any of Python, C,
Javascript, OCaml, etc… in the lefthand side of
\eqref{eqnandloops}. But we’re getting ahead of ourselves: this
issue will be discussed in chapequivalentmodels.

Enhanced NAND++ programs

We now turn to describing the syntax of NAND++ programs. We’ll start by
describing what we call the “enhanced NAND++ programming language”.
Enhanced NAND++ has some extra features on top of NAND++ that make it
easier to describe. However, we will see in
enhancednandequivalence that these extra features can be
implemented as “syntactic sugar” on top of standard or “vanilla” NAND++,
and hence these two programming languages are equivalent in power.

Enhanced NAND++ programs add the following features on top of NAND:

We add a special Boolean variable loop. If loop is equal to \(1\)
at the end of the execution then execution loops back to the first
line of the program.

We add a special integer valued variable i. We add the commands
i += foo and i -= bar that can add or subtract to i either
zero or one, where foo and bar are standard (Boolean valued)
variables.The variable i will actually always be a non-negative integer,
and hence i -= foo will have no effect if i= \(0\). This choice is
made for notational convenience, and the language would have had the
same power if we allowed i to take negative values.

We add arrays to the language by allowing variable identifiers to
have the form Foo[i]. Foo is an array of Boolean values, and
Foo[i] refers to the value of this array at location equal to the
current value of the variable i.

The input and output X and Y are now considered arrays with
values of zeroes and ones. Since both input and output could have
arbitrary length, we also add two new arrays Xvalid and Yvalid
to mark their length. We define Xvalid[ \(i\) ] \(=1\) if and only
if \(i\) is smaller than the length of the input, and similarly we
will set Yvalid[ \(j\) ] to equal \(1\) if and only if \(j\) is
smaller than the length of the output.Xvalid and Yvalid are used to mark the end of the input and
output. This does not mean that the program will “blow up” if it
tries to access for example X[\(j\)] for a value \(j\) for which
Xvalid[\(j\)]\(=0\). All it means is that this value (which will
default to \(0\)) does not correspond to an actual input bit, and we
can use Xvalid to determine that this is the case. Perhaps more
descriptive (though also more cumbersome) names would have been
Xlongerthan and Ylongerthan.

The following is an enhanced NAND++ program to compute the XOR function
on inputs of arbitrary length. That is
\(XOR:\{0,1\}^* \rightarrow \{0,1\}\) such that
\(XOR(x) = \sum_{i=0}^{|x|-1} x_i \mod 2\) for every \(x\in \{0,1\}^*\).

We now present enhanced NAND++ program to compute the increment
function. That is, \(INC:\{0,1\}^* \rightarrow \{0,1\}^*\) such that for
every \(x\in \{0,1\}^n\), \(INC(x)\) is the \(n+1\) bit long string \(y\) such
that if \(X = \sum_{i=0}^{n-1}x_i \cdot 2^i\) is the number represented by
\(x\), then \(y\) is the binary representation of the number \(X+1\).

We start by showing the program using the “syntactic sugar” we’ve seen
before of using shorthand for some NAND programs we have seen before to
compute simple functions such as IF, XOR and AND (as well as the
constant one function as well as the function COPY that just maps a
bit to itself).

Working out the above two example can go a long way towards
understanding NAND++. See the appendix for a full specification of the
language.

Variables as arrays and well-formed programs

In NAND we allowed variables to have names such as foo_17 or even
Bar[23] but the numerical part of the identifier played essentially
the same role as alphabetical part. In particular, NAND would be just as
powerful if we didn’t allow any numbers in the variable identifiers.
With the introduction of the special index variable i, in NAND++
things are different, and we do have actual arrays.

To make sure there is no confusion, we will use the convention that
plain variables (which we will also refer to as scalar variables) are
written with all lower case, and array variables begin with an upper
case letter. Moreover, it turns out that we can ensure without loss of
generality that arrays are always indexed by the variable i. (That is,
if Foo is an array, then whenever Foo is referred to in the program,
it is always in the form Foo[i] and never as Foo[17], Foo[159] or
any other constant numerical index.) Hence all the variable identifiers
in “well formed” NAND++ programs will either have the form foo_123 (a
sequence of lower case letters, underscores, and numbers, with no
brackets or upper case letters) or the form Bar[i] (an identifier
starting with an upper case letter, and ending with [i]). See
wellformedlem for a more formal treatment of the notion of
“well formed programs”.

“Oblivious” / “Vanilla” NAND++

Since our goal in theoretical computer science is not as much to
construct programs as to analyze them, we want to use as simple as
possible computational models. Hence our actual “plain vanilla” NAND++
programming language will be even more “bare bones” than enhanced
NAND++.We will often use the adjective “vanilla” when we want to
emphasize the difference between standard NAND++ and its enhanced
variant. In particular, standard NAND++ does not contain the
commands i += foo and i -= bar to control the integer-valued
variable i. If we don’t have these commands, how would we ever be able
to access arbitrary elements of our arrays? The idea is that standard
NAND++ prescribes a pre-fixed schedule that i progresses in,
regardless of the code of the program or the particular input. Just like
a bus takes always the same route, and you need to wait until it reaches
your station, if you want to access, for example, location 132 in the
array Foo, you can wait until the iteration in which i will equal
132, at which point Foo[i] will refer to the 132-th bit of the array
Foo.

So what is this schedule that i progresses in? There are many choices
for such a schedule that would have worked, but we fix a particular
choice for simplicity. Initially when we run a NAND++ program, the
variable i equals \(0\). When we finish executing all the lines of code
for the first time, if loop equals \(0\) we halt. Otherwise we continue
to the second iteration, but this time the variable i will equal \(1\).
At the end of this iteration once again we halt if loop equals \(0\),
and otherwise we proceed to the third iteration where i gets the value
of \(0\) again. We continue in this way with the fourth iteration having
i\(=1\) and in the fifth iteration i is equal to \(2\), after which it
decreases step by step to \(0\) agin and so on and so forth. Generally, in
the \(k\)-th iteration the value of i equals \(I(k)\) where
\(I=(I(0),I(1),I(2),\ldots)\) is the following sequence (see
indextimefig):

\[
0,1,0,1,2,1,0,1,2,3,2,1,0,1,\ldots
\]

The above is a perfectly fine description of the sequence
\(I(0),I(1),I(2),\ldots\) but it is also possible to find an explicit
mathematical formula for \(I(k)\). Specifically, it is an annoying but not
hard exercise to show that \(I(k)\) is equal to the minimum of
\(|k-r(r+1)|\) where this minimum is taken over all integers \(r\) in
\(\{0,\ldots,k\}\). It can also be shown that the value of \(r\) that
achieves this minimum is between \(\floor{\sqrt{k}-1}\) and
\(\ceil{\sqrt{k}}\).

Figure 3:The value of i is a function of the current iteration. The variable
i progresses according to the sequence
\(0,1,0,1,2,1,0,1,2,3,2,1,0,\ldots\). Via some cumbersome but routine
calculation, it can be shown that at the \(k\)-th iteration the value of
i equals \(k-r(r+1)\) if \(k \leq (r+1)^2\) and \((r+1)(r+2)-k\) if
\(k<(r+1)^2\) where
\(r= \floor{\sqrt{k+1/4}-1/2}\).

Here is the XOR function in NAND++ (using our standard syntactic sugar
to make it more readable):

Note that we use the array Visited to “mark” the positions of the
input that we have already visited. The line
IF(Visited[i],Y[0],XOR(X[i],Y[0])) ensures that the output value
Y[0] is XOR’ed with the \(i\)-th bit of the input only at the first time
we see it.

It would be very instructive for you to compare the enhanced NAND++
program for XOR of XORENANDPP with the standard NAND++ program
of XORNANDPP.

Prove that at the \(k\)-th iteration of the loop, the value of the
variable i is equal to \(index(k)\) where \(index:\N \rightarrow \N\) is
defined as follows: \[
index(k) = \begin{cases} k- r(r+1) & k \leq (r+1)^2 \\ (r+1)(r+2)-k & \text{otherwise} \end{cases} \label{eqindex}
\] where \(r= \floor{\sqrt{k+1/4}-1/2}\).

We say that a NAND program completed its \(r\)-th round when the index
variable i reaches the \(0\) point for \(r+1\) times and hence completes
the sequence:

\[
0,1,0,1,2,1,0,1,2,3,2,1,0,\ldots,0,1,\ldots,r,r-1,\ldots,0
\]

This happens when the program completed

\[
1+2+4+6+\cdots+2r =r^2 +r + 1
\]

iterations of its main loop. (The last equality is obtained by applying
the formula for the sum of an arithmetic progression.) This means that
if we keep a “loop counter” \(k\) that is initially set to \(0\) and
increases by one at the end of any iteration, then the “round” \(r\) is
the largest integer such that \(r(r+1) \leq k\). One can verify that this
means that \(r=\floor{\sqrt{k+1/4}-1/2}\). When \(k\) is between \(r(r+1)\)
and \((r+1)^2\) then the index i is ascending, and hence the value of
\(index(k)\) will be \(k-r(r+1)\). When \(k\) is between \((r+1)^2\) and
\((r+1)(r+2)\) then the index i is descending, and hence the value of
\(index(k)\) will be \(r-(k-(r+1)^2)= (r+1)(r+2)-k\).

Computable functions

We now turn to making one of the most important definitions in this
book, that of computable functions. This definition is deceptively
simple, but will be the starting point of many deep results and
questions. We start by formalizing the notion of a NAND++ computation:

Let \(P\) be a NAND++ program. For every input \(x\in \{0,1\}^*\), we define
the output of \(P\) on input \(x\) (denotes as \(P(x)\)) to be the result of
the following process:

Initialize the variables X[\(i\)]\(=x_i\) and Xvalid[\(i\)]\(=1\)
for all \(i\in [n]\) (where \(n=|x|\)). All other variables (including
i and loop) default to \(0\).

Run the program line by line. At the end of the program, if
loop\(=1\) then increment/decrement i according to the schedule
\(0,1,0,1,2,1,0,1,\ldots\) and go back to the first line.

If loop\(=0\) at the end of the program, then we halt and output
Y[\(0\)] , \(\ldots\), Y[\(m-1\)] where \(m\) is the smallest
integer such that Yvalid[\(m\)]\(=0\).

If the program does not halt on input \(x\), then we say it has no output,
and we denote this as \(P(x) = \bot\).

nandppcomputation can be easily adapted for enhanced NAND++
programs. The only modification is the natural one: instead of i
travelling according to the sequence \(0,1,0,1,2,1,0,1,\ldots\), i is
increased/decreased based on the i += foo and i -= bar operations.

We can now define what it means for a function to be computable:

Let \(F:\{0,1\}^* \rightarrow \{0,1\}^*\) be a (total) function and let
\(P\) be a NAND++ program. We say that \(P\) computes \(F\) if for every
\(x\in \{0,1\}^*\), \(P(x)=F(x)\).

We say that a function \(F\) is NAND++ computable if there is a NAND++
program that computes it.

We will often drop the “NAND++” qualifier and simply call a function
computable if it is NAND++ computable. This may seem “reckless” but,
as we’ll see in chapequivalentmodels, it turns out that being
NAND++-computable is equivalent to being computable in essentially any
reasonable model of computation.

computablefuncdef is, as we mentioned above, one of the most
important definitions in this book. Please re-read it (and
nandppcomputation) and make sure you understand it. Try to
think how you would define the notion of a NAND++ program \(P\)
computing a function, and make sure that you arrive at the same
definition.

This is a good point to remind the reader of the distinction between
functions and programs:

\[ \text{Functions} \;\neq\; \text{Programs} \]

A program \(P\) can compute some function \(F\), but it is not the same as
\(F\). In particular there can be more than one program to compute the
same function. Being “NAND++ computable” is a property of functions,
not of programs.

Many other texts use the term decidable languages (also known as
recursive languages) instead of computable functions. This
terminology has its roots in formal language theory as was pursued by
linguists such as Noam Chomsky. A formal language is simply a subset
\(L \subseteq \{0,1\}^*\) (or more generally \(L \subseteq \Sigma^*\) for
some finite alphabet \(\Sigma\)). The membership or decision problem
for a language \(L\), is the task of determining, given \(x\in \{0,1\}^*\),
whether or not \(x\in L\). One can see that this task is equivalent to
computing the Boolean function \(F:\{0,1\}^* \rightarrow \{0,1\}\) which
is defined as \(F(x)=1\) iff \(x\in L\). Thus saying that the function \(F\)
is computable is equivalent to saying that the corresponding language
\(L\) is decidable. The corresponding concept to a partial function is
known as a promise problem.

Infinite loops and partial functions

One crucial difference between NAND and NAND++ programs is the
following. Looking at a NAND program \(P\), we can always tell how many
inputs and how many outputs it has (by simply looking at the X and Y
variables). Furthermore, we are guaranteed that if we invoke \(P\) on any
input then some output will be produced.

In contrast, given any particular NAND++ program \(P'\), we cannot
determine a priori the length of the output. In fact, we don’t even know
if an output would be produced at all! For example, the following NAND++
program would go into an infinite loop if the first bit of the input is
zero:

loop = NAND(X[0],X[0])

If a program \(P\) fails to stop and produce an output on some an input
\(x\), then it cannot compute any total function \(F\), since clearly on
input \(x\), \(P\) will fail to output \(F(x)\). However, \(P\) can still
compute a partial function.A partial function \(F\) from a set \(A\) to a set \(B\) is a function
that is only defined on a subset of \(A\), (see
functionsec). We can also think of such a function as
mapping \(A\) to \(B \cup \{ \bot \}\) where \(\bot\) is a special
“failure” symbol such that \(F(a)=\bot\) indicates the function \(F\) is
not defined on \(a\).

For example, consider the partial function \(DIV\) that on input a pair
\((a,b)\) of natural numbers, outputs \(\ceil{a/b}\) if \(b > 0\), and is
undefined otherwise. We can define a program \(P\) that computes \(DIV\) on
input \(a,b\) by outputting the first \(c=0,1,2,\ldots\) such that
\(cb \geq a\). If \(a>0\) and \(b=0\) then the program \(P\) will never halt,
but this is OK, since \(DIV\) is undefined on such inputs. If \(a=0\) and
\(b=0\), the program \(P\) will output \(0\), which is also OK, since we don’t
care about what the program outputs on inputs on which \(DIV\) is
undefined. Formally, we define computability of partial functions as
follows:

Let \(F\) be either a total or partial function mapping \(\{0,1\}^*\) to
\(\{0,1\}^*\) and let \(P\) be a NAND++ program. We say that \(P\) computes
\(F\) if for every \(x\in \{0,1\}^*\) on which \(F\) is defined,
\(P(x)=F(x)\).Note that if \(F\) is a total function, then it is defined on every
\(x\in \{0,1\}^*\) and hence in this case, this definition is
identical to computablefuncdef.

We say that a (partial or total) function \(F\) is NAND++ computable if
there is a NAND++ program that computes it.

Equivalence of “vanilla” and “enhanced” NAND++

We have defined so far not one but two programming languages to handle
functions with unbounded input lengths: “enhanced” NAND++ which contains
the i += bar and i -= foo operations, and the standard or “vanilla”
NAND++, which does not contain these operations, but rather where the
index i travels obliviously according to the schedule
\(0,1,0,1,2,1,0,1,\ldots\).

We now show these two versions are equivalent in power:

Let \(F:\{0,1\}^* \rightarrow \{0,1\}^*\). Then \(F\) is computable by a
NAND++ program if and only if \(F\) is computable by an enhanced NAND++
program.

To prove the theorem we need to show (1) that for every NAND++
program \(P\) there is an enhanced NAND++ program \(Q\) that computes the
same function as \(P\), and (2) that for every enhanced NAND++ program
\(Q\), there is a NAND++ program \(P\) that computes the same function as
\(Q\).

Showing (1) is quite straightforward: all we need to do is to show
that we can ensure that i follows the sequence
\(0,1,0,1,2,1,0,1,\ldots\) using the i += foo and i -= foo operations.
The idea is that we use a Visited array to keep track at which places
we visited, as well as a special Atstart array for which we ensure
that Atstart[\(0\)]\(=1\) but Atstart[\(i\)]\(=0\) for every \(i>0\). We
can use these arrays to check in each iteration whether i is equal to
\(0\) (in which case we want to execute i += 1 at the end of the
iteration), whether i is at a point which we haven’t seen before (in
which case we want to execute i -= 1 at the end of the iteration), or
whether it’s at neither of those extremes (in which case we should add
or subtract to i the same value as the last iteration).

Showing (2) is a little more involved. Our main observation is that
we can simulate a conditional GOTO command in NAND++. That is, we can
come up with some “syntactic sugar” that will have the effect of jumping
to a different line in the program if a certain variable is equal to
\(1\). Once we have this, we can implement looping commands such as
while. This allows us to simulate a command such as i += foo when
i is currently in the “decreasing phase” of its cycle by simply
waiting until i reaches the same point in the “increasing phase”. The
intuition is that the difference between standard and enhanced NAND++ is
like the difference between a bus and a taxi. Ennhanced NAND++ is like a
taxi - you tell i where to do. Standard NAND++ is like a bus - you
wait until i arrives at the point you want it to be in. A bus might be
a little slower, but will eventually get you to the same place.

We split the full proof of enhancednandequivalence into two
parts. In vanillatoenhancedsec we show the easier direction of
simulating standard NAND++ programs by enhanced ones. In
nhanvedtovanillasec we show the harder direction of simulating
enhanced NAND++ programs by standard ones. Along the way we will show
how we can simulate the GOTO operation in NAND++ programs.

Simulating NAND++ programs by enhanced NAND++ programs.

Let \(P\) be a standard NAND++ program. To create an enhanced NAND++
program that computes the same function, we will add a variable
indexincreasing and code to ensure that at the end of the iteration,
if indexincreasing equals \(1\) then i needs to increase by \(1\) and
otherwise i needs to decrease by \(1\). Once we ensure that, we can
emulate \(P\) by simply adding the following lines to the end of the
program

i += indexincreasing
i -= NOT(indexincreasing)

where one and zero are variables which are always set to be zero or
one, and IF is shorthand for NAND implementation of our usual \(IF\)
function (i.e., \(IF(a,b,c)\) equals \(b\) if \(a=1\) and \(c\) otherwise).

To compute indexincreasing we use the fact that the sequence
\(0,1,0,1,2,1,0,1,\ldots\) of i’s travels in a standard NAND++ program
is obtained from the following rules:

At the beginning i is increasing.

If i reaches a point which it hasn’t seen before, then it starts
decreasing.

If i reaches the initial point \(0\), then it starts increasing.

To know which points we have seen before, we can borrow Hansel and
Gretel’s technique of leaving “breadcrumbs”. That is, we will create
an array Visited and add code Visited[i] = one at the end of every
iteration. This means that if Visited[i]\(=0\) then we know we have not
visited this point before. Similarly we create an array Atstart array
and add code Atstart[0] = one (while all other location remain at the
default value of zero). Now we can use Visited and Atstart to
compute the value of indexincreasing. Specifically, we will add the
following pieces of code

Figure 4:We can know if the index variable i should increase or decrease by
keeping an array atstart letting us know when i reaches \(0\), and
hence i starts increasing, and breadcrumb letting us know when we
reach a point we haven’t seen before, and hence i starts decreasing.
TODO: update figure to Atstart and Visited
notation.

Given any standard NAND++ program \(P\), we can add the above lines of
code to it to obtain an enhanced NAND++ program \(Q\) that will behave in
exactly the same way as \(P\) and hence will compute the same function.
This completes the proof of the first part of
enhancednandequivalence.

Simulating enhanced NAND++ programs by NAND++ programs.

To simulate enhanced NAND++ programs by vanilla ones, we will do as
follows. We introduce an array Markposition which normally would be
all zeroes. We then replace the line i += foo with code that achieves
the following:

We first check if foo=0. If so, then we do nothing.

Otherwise we set Markposition[i]=one.

We then want to add code that will do nothing until we get to the
position i+1. We can check this condition by verifying that both
Markposition[i]\(=1\) and indexincreasing\(=1\) at the end of the
iteration.

We will start by describing how we can achieve this under the assumption
that we have access to GOTO and LABEL operations. LABEL(l) simply
marks a line of code with the string l. GOTO(l,cond) jumps in
execution to the position labeled l if cond is equal to \(1\).Since this is a NAND++ program, we assume that if the label l is
before the GOTO then jumping in execution means that another
iteration of the program is finished, and the index variable i is
increased or decreased as usual.

pre-code... #pre-increment code
# replacement for i += foo
waiting = foo # if foo=1 then we need to wait
Markposition[i] = foo # we mark the position we were at
GOTO("end",waiting) # If waiting then jump till end.
LABEL("postcode")
waiting = zero
timeforpostcode = zero
post-code...
LABEL("end")
maintainance-code... # maintain value of indexincreasing variable as before
condition = AND(Markposition[i],indexincreasing) # when to stop waiting.
Markposition[i] = IF(condition,zero,Markposition[i]) # zero out Markposition if we are done waiting
GOTO("postcode",AND(condition,waiting)) # If condition is one and we were waiting then go to instruction after increment
GOTO("end",waiting) # Otherwise, if we are still in waiting then go back to "end" skipping all the rest of the code
# (since this is another iteration of the program i keeps travelling as usual.)

Please make sure you understand the above construct. Also note that the
above only works when there is a single line of the form i += foo or
i -= bar in the program. When there are multiple lines then we need to
add more labels and variables to take care of each one of them
separately. Stopping here and working out how to handle more labels is
an excellent way to get a better understanding of this construction.

Implementing GOTO: the importance of doing nothing. The above
reduced the task of completing the proof of
enhancednandequivalence to implementing the GOTO function,
but we have not yet shown how to do so. We now describe how we can
implement GOTO in NAND++. The idea is simple: to simulate
GOTO(l,cond), we modify all the lines between the GOTO and LABEL
commands to do nothing if the condition is true. That is, we modify code
of the form:

where GUARDED(between-code,donothing_l) refers to transforming every
line in between-code from the form foo = NAND(bar,blah) to the form
foo = IF(donothing_l,foo,NAND(bar,blah)). That is, the “guarded”
version of the code keeps the value of every variable the same if
donothing_l equals \(1\). We leave to you to verify that the above
approach extends to multiple GOTO statements. This completes the proof
of the second and final part of enhancednandequivalence.

It is important to go over this proof and verify you understand it. One
good way to do so is to understand how you the proof handles multiple
GOTO statements. You can do so by eliminating one GOTO statement at
a time. For every distinct label l, we will have a different variable
donothing_l.

The GOTO statement was a staple of most early programming languages,
but has largely fallen out of favor and is not included in many modern
languages such as Python, Java, Javascript. In 1968, Edsger
Dijsktra wrote a famous letter titled “Go to statement considered
harmful.” (see also xkcdgotofig). The
main trouble with GOTO is that it makes analysis of programs more
difficult by making it harder to argue about invariants of the
program.

When a program contains a loop of the form:

for j in range(100):
do something
do blah

you know that the line of code do blah can only be reached if the loop
ended, in which case you know that j is equal to \(100\), and might also
be able to argue other properties of the state of the program. In
contrast, if the program might jump to do blah from any other point in
the code, then it’s very hard for you as the programmer to know what you
can rely upon in this code. As Dijkstra said, such invariants are
important because “our intellectual powers are rather geared to master
static relations and .. our powers to visualize processes evolving in
time are relatively poorly developed” and so “we should … do …our
utmost best to shorten the conceptual gap between the static program and
the dynamic process.”

That said, GOTO is still a major part of lower level languages where
it is used to implement higher level looping constructs such as while
and for loops. For example, even though Java doesn’t have a GOTO
statement, the Java Bytecode (which is a lower level representation of
Java) does have such a statement. Similarly, Python bytecode has
instructions such as POP_JUMP_IF_TRUE that implement the GOTO
functionality, and similar instructions are included in many assembly
languages. The way we use GOTO to implement a higher level
functionality in NAND++ is reminiscent of the way these various jump
instructions are used to implement higher level looping constructs.

Figure 5:XKCD’s take on the GOTO
statement.

Well formed programs: The NAND++ style manual

The notion of passing between different variants of programs can be
extremely useful, as often, given a program \(P\) that we want to analyze,
it would be simpler for us to first modify it to an equivalent program
\(P'\) that has some convenient properties. You can think of this as the
NAND++ equivalent of enforcing “coding conventions” that are often used
for programming languages. For example, while this is not part of the
Python language, Google’s Python style
guide
stipulates that variables that are initialized to a value and never
changed (i.e., constants) are typed with all capital letters. (Similar
requirements are used in other style
guides.)
Of course this does not really restrict the power of Google-conforming
Python programs, since every Python program can be transformed to an
equivalent one that satisfies this requirement. In fact, many
programming languages have automatic programs known as
linters that can detect and sometimes modify
the program to fit certain standards.

The following solved exercise is an example of that. We will define the
notion of a well-formed program and show that every NAND++ program can
be transformed into an equivalent one that is well formed.

We say that an (enhanced or vanilla) NAND++ program \(P\) is well formed
if it satisfies the following properties:

Every reference to a variable in \(P\) either has the form foo or
foo_123 (a scalar variable: alphanumerical string starting with
a lowercase letter and no brackets) or the form Bar[i] or
Bar_12[i] (an array variable alphanumerical string starting with
a capital letter and ending with [i]).

\(P\) contains the scalar variables zero, one and
indexincreasing such that zero and one are always the
constants \(0\) and \(1\) respectively, and the program contains code
that ensures that at the end of each iteration, indexincreasing is
equal to \(1\) if in the next iteration i will increase by one above
its current value, and is equal to \(0\) if in the next iteration i
will decrease by one.

\(P\) contains the array variables Visited and Atstart and code to
ensure that Atstart[ \(i\) ] equals \(1\) if and only if \(i=0\), and
Visited[\(i\)] equals \(1\) for all the positions \(i\) such that the
program finished an iteration with the index variable i equalling
\(i\).

\(P\) contains code to set loop to \(1\) at the beginning of the first
iteration, and to ensure that if loop is ever set to \(0\) then it
stays at \(0\), and moreover that if loop equals \(0\) then the values
of Y and Yvalid cannot change.

The following exercise shows that we can transform every NAND++ program
\(P\) into a well-formed program \(P'\) that is equivalent to it. Hence if
we are given a NAND++ program \(P\), we can (and will) often assume
without loss of generality that it is well-formed.

For every (enhanced or vanilla) NAND++ program \(P\), there exists an
(enhanced or vanilla, respectively) NAND++ program \(P'\) equivalent to
\(P\) that is well formed as pre wellformeddef. That is, for
every input \(x\in \{0,1\}^*\), either both \(P\) and \(P'\) do not halt on
\(x\), or both \(P\) and \(P'\) halt on \(x\) and produce the same output
\(y\in \{0,1\}^*\).

As usual, I would recommend you try to solve this exercise yourself
before looking up the solution.

Since variable identifiers on their own have no meaning in (enhanced)
NAND++ (other than the special ones X, Xvalid, Y, Yvalid and
loop, that already have the desired properties), we can easily achieve
the property that scalars variables start with lowercase and arrays with
uppercase using “search and replace”. We just have to take care that we
don’t make two distinct identifiers become the same. For example, we can
do so by changing all scalar variable identifiers to lower case, and
adding to them the prefix scalar_, and adding the prefix Array_ to
all array variable identifiers.

The property that an array variable is never references with a numerical
index is more challenging. We need to remove all references to an array
variable with an actual numerical index rather than i. One thought
might be to simply convert a a reference of the form Arr[17] to the
scalar variable arr_17. However, this will not necessarily preserve
the functionality of the program. The reason is that we want to ensure
that when i\(=17\) then Arr[i] would give us the same value as
arr_17.

Nevertheless, we can use the approach above with a slight twist. We will
demonstrate the solution in a concrete case.(Needless to say, if you
needed to solve this question in a problem set or an exam, such a
demonstration of a special case would not be sufficient; but this
example should be good enough for you to extrapolate a full solution.)
Suppose that there are only three references to array variables with
numerical indices in the program: Foo[5], Bar[12] and Blah[22]. We
will include three scalar variables foo_5, bar_12 and blah_22
which will serve as a cache for the values of these arrays. We will
change all references to Foo[5] to foo_5, Bar[12] to bar_12 and
so on and so forth. But in addition to that, whenever in the code we
refer to Foo[i] we will check if i\(=5\) and if so use the value
foo_5 instead, and similarly with Bar[i] or Blah[i].

Specifically, we will change our program as follows. We will create an
array Is_5 such that Is_5[i]\(=1\) if and only i\(=5\), and similarly
create arrays Is_12, Is_22.

We can then change code of the following form

Foo[i] = something

to

temp = something
foo_5 = IF(Is_5[i],temp,foo_5)
Foo[i] = temp

and similarly code of the form

blah = NAND(Bar[i],baz)

to

temp = If(Is_22[i],bar_22,Bar[i])
blah = NAND(temp,baz)

To create the arrays we can add code of the following form in the
beginning of the program (here we’re using enhanced NAND++ syntax,
GOTO, and the constant one but this syntactic sugar can of course be
avoided):

# initialization of arrays
GOTO("program body",init_done)
i += one
i += one
i += one
i += one
i += one
Is_5[i] = one
i += one
... # repeat i += one 6 more times
Is_12[i] = one
i += one
... # repeat i += one 9 more times
Is_22[i] = one
i -= one
... # repeat i -= one 21 more times
init_done = one
LABEL("program body")
original code of program..

Using IF statements (which can easily be replaced with syntactic
sugar) we can handle the conditions that loop, Y, and Yvalid are
not written to once loop is set to \(0\). We leave completing all the
details as an exercise to the reader (see
standardnoabsoluteindexex).

Turing Machines

“Computing is normally done by writing certain symbols on paper. We
may suppose that this paper is divided into squares like a child’s
arithmetic book.. The behavior of the [human] computer at any moment
is determined by the symbols which he is observing, and of his “state
of mind” at that moment… We may suppose that in a simple operation
not more than one symbol is altered.”,“We compare a man in the process of computing … to a machine which
is only capable of a finite number of configurations… The machine is
supplied with a “tape” (the analogue of paper) … divided into
sections (called “squares”) each capable of bearing a “symbol””, Alan
Turing, 1936

“What is the difference between a Turing machine and the modern
computer? It’s the same as that between Hillary’s ascent of Everest
and the establishment of a Hilton hotel on its peak.” , Alan Perlis,
1982.

Figure 6:Aside from his many other achievements, Alan Turing was an excellent
long distance runner who just fell shy of making England’s olympic team.
A fellow runner once asked him why he punished himself so much in
training. Alan said “I have such a stressful job that the only way I can
get it out of my mind is by running hard; it’s the only way I can get
some release.”

The “granddaddy” of all models of computation is the Turing Machine,
which is the standard model of computation in most textbooks.This definitional choice does not make much difference since, as
we show here, NAND++ programs are equivalent to Turing machines in
their computing power. Turing
machines were defined in 1936 by Alan Turing in an attempt to formally
capture all the functions that can be computed by human “computers” (see
humancomputersfig) that follow a well-defined set of rules,
such as the standard algorithms for addition or multiplication.Alan Turing was one of the intellectual giants of the 20th
century. He was not only the first person to define the notion of
computation, but also intimately involved in the use of
computational devices as part of the effort to break the Enigma
cipher during World War II, saving millions of
lives. Tragically, Turing committed suicide
in 1954, following his conviction in 1952 for homosexual acts and a
court-mandated hormonal treatment. In 2009, British prime minister
Gordon Brown made an official public apology to Turing, and in 2013
Queen Elizabeth II granted Turing a posthumous pardon. Turing’s life
is the subject of a great book and a
mediocre movie.

Figure 7:Until the advent of electronic computers, the word “computer” was used
to describe a person that performed calculations. These human computers
were absolutely essential to many achievements including mapping the
stars, breaking the Enigma cipher, and the NASA space mission. Two
recent books about these human computers (which were more often than not
women) and their important contributions are The Glass
Universe
(from which this photo is taken) and Hidden
Figures.

Turing thought of such a person as having access to as much “scratch
paper” as they need. For simplicity we can think of this scratch paper
as a one dimensional piece of graph paper (or tape, as it is commonly
referred to), which is divided to “cells”, where each “cell” can hold a
single symbol (e.g., one digit or letter, and more generally some
element of a finite alphabet). At any point in time, the person can
read from and write to a single cell of the paper, and based on the
contents can update his/her finite mental state, and/or move to the cell
immediately to the left or right of the current one.

Thus, Turing modeled such a computation by a “machine” that maintains
one of \(k\) states, and at each point can read and write a single symbol
from some alphabet \(\Sigma\) (containing \(\{0,1\}\)) from its “work tape”.
To perform computation using this machine, we write the input
\(x\in \{0,1\}^n\) on the tape, and the goal of the machine is to ensure
that at the end of the computation, the value \(F(x)\) will be written on
the tape. Specifically, a computation of a Turing Machine \(M\) with \(k\)
states and alphabet \(\Sigma\) on input \(x\in \{0,1\}^*\) proceeds as
follows:

Initially the machine is at state \(0\) (known as the “starting
state”) and the tape is initialized to
\(\triangleright,x_0,\ldots,x_{n-1},\varnothing,\varnothing,\ldots\).We use the symbol \(\triangleright\) to denote the beginning of the
tape, and the symbol \(\varnothing\) to denote an empty cell. Hence we
will assume that \(\Sigma\) contains these symbols, along with \(0\) and
\(1\).

The location \(i\) to which the machine points to is set to \(0\).

At each step, the machine reads the symbol \(\sigma = T[i]\) that is
in the \(i^{th}\) location of the tape, and based on this symbol and
its state \(s\) decides on:

When the machine reaches the state \(s=k-1\) (known as the “halting
state”) then it halts. The output of the machine is obtained by
reading off the tape from location \(1\) onwards, stopping at the
first point where the symbol is not \(0\) or \(1\).

Figure 9:A Turing machine has access to a tape of unbounded length. At each
point in the execution, the machine can read/write a single symbol of
the tape, and based on that decide whether to move left, right or
halt.

TODO: update figure to \(\{0,\ldots,k-1\}\).

Let \(PAL\) (for palindromes) be the function that on input
\(x\in \{0,1\}^*\), outputs \(1\) if and only if \(x\) is an (even length)
palindrome, in the sense that
\(x = w_0 \cdots w_{n-1}w_{n-1}w_{n-2}\cdots w_0\) for some \(n\in \N\) and
\(w\in \{0,1\}^n\).

We now show a Turing Machine \(M\) that computes \(PAL\). To specify \(M\) we
need to specify (i) \(M\)’s tape alphabet \(\Sigma\) which should
contain at least the symboles \(0\),\(1\), \(\triangleright\) and
\(\varnothing\), and (ii) \(M\)’s transition function which determines
what action \(M\) takes when it reads a given symbol while it is in a
particular state.

\(M\) starts in state START and will go right, looking for the first
symbol that is \(0\) or \(1\). If we find \(\varnothing\) before we hit
such a symbol then we will move to the OUTPUT_1 state that we
describe below.

Once \(M\) found such a symbol \(b \in \{0,1\}\), \(M\) deletes \(b\) from
the tape by writing the \(\times\) symbol, it enters either the
RIGHT_0 or RIGHT_1 mode according to the value of \(b\) and starts
moving rightwards until it hits the first \(\varnothing\) or \(\times\)
symbol.

Once we found this symbol we into the state LOOK_FOR_0 or
LOOK_FOR_1 depending on whether we were in the state RIGHT_0 or
RIGHT_1 and make one left move.

In the state LOOK_FOR_\(b\), we check whether the value on the tape
is \(b\). If it is, then we delete it by changing its value to
\(\times\), and move to the state RETURN. Otherwise, we change to
the OUTPUT_0 state.

The RETURN state means we go back to the beginning. Specifically,
we move leftward until we hit the first symbol that is not \(0\) or
\(1\), in which case we change our state to START.

The OUTPUT_\(b\) states mean that we are going to output the value
\(b\). In both these states we go left until we hit \(\triangleright\).
Once we do so, we make a right step, and change to the 1_AND_BLANK
or 0_AND_BLANK states respectively. In the latter states, we write
the corresponding value, and then move right and change to the
BLANK_AND_STOP state, in which we write \(\varnothing\) to the tape
and move to the final STOP state.

The above description can be turned into a table describing for each one
of the \(14\cdot 5\) combination of state and symbol, what the Turing
machine will do when it is in that state and it reads that symbol. This
table is known as the transition function of the Turing machine.

For every \(x\in \{0,1\}^*\), the output of \(M\) on input \(x\), denoted by
\(M(x)\), is the result of the following process:

We initialize \(T\) to be the sequence
\(\triangleright,x_0,x_1,\ldots,x_{n-1},\varnothing,\varnothing,\ldots\),
where \(n=|x|\). (That is, \(T[0]=\triangleright\), \(T[i+1]=x_{i}\) for
\(i\in [n]\), and \(T[i]=\varnothing\) for \(i>n\).)

We also initialize \(i=0\) and \(s=0\).

We then repeat the following process as long as \(s \neq k-1\):

Let \((s',\sigma',D) = M(s,T[i])\)

Set \(s \rightarrow s'\), \(T[i] \rightarrow \sigma'\).

If \(D=\mathbb{R}\) then set \(i \rightarrow i+1\), if
\(D=\mathbb{L}\) then set \(i \rightarrow \max\{i-1,0\}\),

The result of the process is the string \(T[1],\ldots,T[m]\) where
\(m>0\) is the smallest integer such that \(T[m+1] \not\in \{0,1\}\). If
the process never ends then we denote the result by \(\bot\).

We say that the Turing machine \(M\) computes a (partial) function
\(F:\{0,1\}^* \rightarrow \{0,1\}^*\) if for every \(x\in\{0,1\}^*\) on
which \(F\) is defined, \(M(x)=F(x)\).

You should make sure you see why this formal definition corresponds to
our informal description of a Turing Machine. To get more intuition on
Turing Machines, you can play with some of the online available
simulators such as Martin
Ugarte’s, Anthony
Morphett’s, or Paul
Rendell’s.

One should not confuse the transition function of a Turing machine \(M\)
with the function that the machine computes. The transition function is
a finite function, with \(k|\Sigma|\) inputs and \(2k|\Sigma|\) outputs.
(Can you see why?) The machine can compute an infinite function \(F\)
that takes as input a string \(x\in \{0,1\}^*\) of arbitrary length and
might also produce an arbitrary length string as output.

In our formal definition, we identified the machine with its transition
function since the transition function tells us everything we need to
know about the Turing machine, and hence serves as a good mathematical
representation of it. This choice of representation is somewhat
arbitrary, and is based on our convention that the state space is always
the numbers \(\{0,\ldots,k-1\}\), where we use \(0\) as our starting state
and \(k-1\) as our halting state. Other texts use different conventions
and so their mathematical definition of a Turing machine might look
superficially different, although ultimately it describes the same
computational process and has the same computational powers.

For example, Sipser’s text allows a more
general set of states \(Q\) and allow to designate arbitrary elements of
\(Q\) as starting and halting states, though by simple relabeling of the
states one can see that this has no effect on the computational power of
the model. Sipser also restricts attention to Turing machines that
output only a single bit. In such cases, it is convenient to have two
halting states: one of them is designated as the “\(0\) halting state”
(often known as the rejecting state) and the other as the “\(1\) halting
state” (often known as the accepting state). Thus instead of writing
\(0\) or \(1\), the machine will enter into one of these states and halt.
This again makes no difference to the computational power, though we
prefer to consider the more general model with multi-bit outputs.
Finally, Sipser considers also functions with input in \(\Sigma^*\) for an
arbitrary alphabet \(\Sigma\) (and hence distiguishes between the input
alphabet which he denotes as \(\Sigma\) and the tape alphabet which he
denotes as \(\Gamma\)), while we restrict attention to functions with
binary strings as input. The bottom line is that Sipser defines Turing
machines as a seven tuple consisting of the state space, input
alphabet, tape alphabet, transition function, starting state, accpeting
state, and rejecting state. Yet, this is simply a different
representation of the same concept, just as a graph can be represented
in either adjacency list or adjacency matrix form.

Turing machines as programming languages

The name “Turing machine”, with its “tape” and “head” evokes a physical
object, while a program is ultimately, a piece of text. But we can think
of a Turing machine as a program as well. For example, consider the
Turing Machine \(M\) of turingmachinepalindrome that computes
the function \(PAL\) such that \(PAL(x)=1\) iff \(x\) is a palindrome. We can
also describe this machine as a program using the Python-like
pseudocode of the form below

The particular details of this program are not important. What is
important is that we can describe Turing machines as programs.
Moreover, note that when translating a Turing machine into a program,
the Tape becomes a list or array that can hold values from the
finite set \(\Sigma\).Most programming languages use arrays of fixed size, while a
Turing machine’s tape is unbounded, though of course there is no
need to store an infinite number of \(\varnothing\) symbols. If you
want, you can think of the tape as a list that starts off at some a
length that is just long enough to store the input, but is
dynamically grown in size as the Turing machine’s head explores new
positions. The head position can be thought of as an
integer valued variable that can hold integers of unbounded size. In
contrast, the current state can only hold one of a fixed number of
values. In particular, if the number of states is \(k\), then we can
represent the state of the Turing machine using \(\ceil{\log k}\) bits.
Equivalently, if our programming language had only Boolean (i.e.,
\(0\)/\(1\)-valued) variables, then we could replace the variable state
with \(\ceil{\log k}\) such variables. Similarly, we can represent each
element of the alphabet \(\Sigma\) using \(\ceil{\log |\Sigma|}\) bits.
Hence if our programming language had only Boolean valued arrays, we
could replace the array Tape with \(\ceil{\log |\Sigma|}\) such arrays.

Turing machines and NAND++ programs

Given the above discussion, it might not be surprising that Turing
machines turn out to be equivalent to NAND++ programs. Nevertheless,
this is an important result, and the first of many other such
equivalence results we will see in this book.

For every \(F:\{0,1\}^* \rightarrow \{0,1\}^*\), \(F\) is computable by a
NAND++ program if and only if there is a Turing Machine \(M\) that
computes \(F\).

Once again, to prove such an equivalence theorem, we need to show two
directions. We need to be able to (1) transform a Turing machine \(M\)
to a NAND++ program \(P\) that computes the same function as \(P\) and
(2) transform a NAND++ program \(P\) into a Turing machine \(M\) that
computes the same function as \(P\).

The idea of the proof is illustrated in tmvsnandppfig. To show
(1), given a Turing machine \(M\), we will create a NAND program \(P\)
that will have an array Tape for the tape of \(M\) and scalar (i.e., non
array) variable(s) state for the state of \(M\). Specifically, since the
state of a Turing machine is not in \(\{0,1\}\) but rather in a larger set
\([k]\), we will use \(\ceil{\log k}\) variables state_\(0\) , \(\ldots\),
state_\(\ceil{\log k}-1\) variables to store the representation of the
state. Similarly, to encode the larger alphabet \(\Sigma\) of the tape, we
will use \(\ceil{\log |\Sigma|}\) arrays Tape_\(0\) , \(\ldots\),
Tape_\(\ceil{\log |\Sigma|}-1\), such that the \(i^{th}\) location of
these arrays encodes the \(i^{th}\) symbol in the tape for every tape.
Using the fact that every function can be computed by a NAND program,
we will be able to compute the transition function of \(M\), replacing
moving left and right by decrementing and incrementing i respectively.

We show (2) using very similar ideas. Given a program \(P\) that uses
\(a\) array variables and \(b\) scalar variables, we will create a Turing
machine with about \(2^b\) states to encode the values of scalar
variables, and an alphabet of about \(2^a\) so we can encode the arrays
using our tape. (The reason the sizes are only “about” \(2^a\) and \(2^b\)
is that we will need to add some symbols and steps for bookkeeping
purposes.) The Turing Machine \(M\) will simulate each iteration of the
program \(P\) by updating its state and tape accordingly.

Figure 10:Comparing a Turing Machine to a NAND++ program. Both have an unbounded
memory component (the tape for a Turing machine, and the arrays for
a NAND++ program), as well as a constant local memory (state for a
Turing machine, and scalar variables for a NAND++ program). Both can
only access at each step one location of the unbounded memory, this is
the “head” location for a Turing machine, and the value of the index
variable i for a NAND++
program.

We now prove the “if” direction of TM-equiv-thm, namely we
show that given a Turing machine \(M\), we can find a NAND++ program \(P_M\)
such that for every input \(x\), if \(M\) halts on input \(x\) with output \(y\)
then \(P_M(x)=y\). Because by enhancednandequivalence enhanced
and plain NAND++ are equivalent in power, it is sufficient to construct
an enhanced NAND++ program that has this property. Moreover, since our
goal is just to show such a program \(P_M\) exists, we don’t need to
write out the full code of \(P_M\) line by line, and can take advantage of
our various “syntactic sugar” in describing it.

The key observation is that by NAND-univ-thm we can compute
every finite function using a NAND program. In particular, consider
the function
\(M:[k]\times \Sigma \rightarrow [k] \times \Sigma \times \{\mathbb{L},\mathbb{R} \}\)
corresponding to our Turing Machine. We can encode \([k]\) using
\(\{0,1\}^\ell\), \(\Sigma\) using \(\{0,1\}^{\ell'}\), and
\(\{\mathbb{L},\mathbb{R} \}\) using \(\{0,1\}\), where
\(\ell = \ceil{\log k}\) and \(\ell' = \ceil{\log |\Sigma|}\). Hence we can
identify \(M\) with a function
\(\overline{M}:\{0,1\}^\ell \times \{0,1\}^{\ell'} \rightarrow \{0,1\}^\ell \times \{0,1\}^{\ell'} \times \{0,1\}\),
and by NAND-univ-thm there exists a finite length NAND program
ComputeM that computes this function \(\overline{M}\). The enhanced
NAND++ program to simulate \(M\) will be the following:

where we use state as shorthand for the tuple of variables
state_\(0\), \(\ldots\), state_\(\ell-1\) and Tape[i] as shorthand for
Tape_\(0\)[i] ,\(\ldots\), Tape_\(\ell'-1\)[i] where
\(\ell = \ceil{\log k}\) and \(\ell' = \ceil{\log |\Sigma|}\).

In the description above we also take advantage of our GOTO syntactic
sugar as well as having access to the NOTEQUAL function to compare two
strings of length \(\ell\). Copying X[\(0\)], \(\ldots\), X[\(n-1\)]
(where \(n\) is the smallest integer such that Xvalid[\(n\)]\(=0\)) to
locations Tape[\(1\)] , \(\ldots\), Tape[\(n\)] can be done by a
simple loop, and we can use a similar loop at the end to copy the tape
into the Y array (marking where to stop using Yvalid). Since every
step of the main loop of the above program perfectly mimics the
computation of the Turing Machine \(M\) as ComputeM computes the
transition of the Turing Machine, and the program carries out exactly
the definition of computation by a Turing Machine as per
TM-def.

For the other direction, suppose that \(P\) is a (standard) NAND++ program
with \(s\) lines, \(\ell\) scalar variables, and \(\ell'\) array variables. We
will show that there exists a Turing machine \(M_P\) with \(2^\ell+C\)
states and alphabet \(\Sigma\) of size \(C' + 2^{\ell'}\) that computes the
same functions as \(P\) (where \(C\), \(C'\) are some constants to be
determined later). > Specifically, consider the function
\(\overline{P}:\{0,1\}^\ell \times \{0,1\}^{\ell'} \rightarrow \{0,1\}^\ell \times \{0,1\}^{\ell'}\)
that on input the contents of \(P\)’s scalar variables and the contents of
the array variables at location i in the beginning of an iteration,
outputs all the new values of these variables at the end of the
iteration. We can assume without loss of generality that \(P\) contains
the variables indexincreasing, Atzero and Visited as we’ve seen
before, and so we can compute whether i will increase or decrease
based on the state of these variables. Also note that loop is one of
the scalar variables of \(P\). Hence the Turing machine can simulate an
execution of \(P\) in one iteration using a finite function applied to its
alphabet. The overall operation of the Turing machine will be as
follows:

The machine \(M_P\) encodes the contents of the array variables of \(P\)
in its tape, and the contents of the scalar variables in (part of)
its state.

Initially, the machine \(M_P\) will scan the input and copy the result
to the parts of the tape corresponding to the X and Xvalid
variables of \(P\). (We use some extra states and alphabet symbols to
achieve this.)

The machine will \(M_P\) then simulates each iterations of \(P\) by
applying the constant function to update the state and the location
of the head, as long as the loop variable of \(P\) equals \(1\).

When the loop variable equals \(1\), the machine \(M_P\) will scan the
output arrays and copy them to the beginning of the tape. (Again we
can add some states and alphabet symbols to achieve this.)

At the end of this scan the machine \(M_P\) will enter its halting
state.

The above is not a full formal description of a Turing Machine, but our
goal is just to show that such a machine exists. One can see that \(M_P\)
simulates every step of \(P\), and hence computes the same function as
\(P\).

Once you understand the definitions of both NAND++ programs and Turing
Machines, TM-equiv-thm is fairly straightforward. Indeed,
NAND++ programs are not as much a different model from Turing Machines
as a reformulation of the same model in programming language notation.
> Specifically, NAND++ programs correspond to a type of Turing Machines
known as single tape oblivious Turing machines.

If we examine the proof of TM-equiv-thm then we can see that
the equivalence between NAND++ programs and Turing machines is up to
polynomial overhead in the number of steps required to compute the
function.

Specifically, in the Transformation of a NAND++ program to a Turing
machine we used one step of the machine to compute one iteration of the
NAND++ program, and so if the NAND++ program \(P\) took \(T\) iterations to
compute the function \(F\) on some input \(x\in \{0,1\}^n\) and \(|F(x)|=m\),
then the number of steps that the Turing machine \(M_P\) takes is
\(O(T+n+m)\) (where the extra \(O(n+m)\) is to copy the input and output).
In the other direction, our program to simulate a machine \(M\) took one
iteration to simulate a step of \(M\), but we used some syntactic sugar,
and in particular allowed ourself to use an enhanced NAND++ program. A
careful examination of the proof of enhancednandequivalence
shows that our transformation of an enhanced to a standard NAND++ (using
the “breadcrumbs” and “wait for the bus” strategies) would at the worst
case expand \(T\) iterations into \(O(T^2)\) iterations. This turns out the
most expensive step of all the other syntactic sugar we used. Hence if
the Turing machine \(M\) takes \(T\) steps to compute \(F(x)\) (where \(|x|=n\)
and \(|F(x)|=m\)) then the (standard) NAND++ program \(P_M\) will take
\(O(T^2+n+m)\) steps to compute \(F(x)\). We will come back to this question
of measuring number of computation steps later in this course. For now
the main take away point is that NAND++ programs and Turing Machines are
roughly equivalent in power even when taking running time into account.

Uniformity, and NAND vs NAND++ (discussion)

While NAND++ adds an extra operation over NAND, it is not exactly
accurate to say that NAND++ programs are “more powerful” than NAND
programs. NAND programs, having no loops, are simply not applicable for
computing functions with more inputs than they have lines. The key
difference between NAND and NAND++ is that NAND++ allows us to express
the fact that the algorithm for computing parities of length-\(100\)
strings is really the same one as the algorithm for computing parities
of length-\(5\) strings (or similarly the fact that the algorithm for
adding \(n\)-bit numbers is the same for every \(n\), etc.). That is, one
can think of the NAND++ program for general parity as the “seed” out of
which we can grow NAND programs for length \(10\), length \(100\), or length
\(1000\) parities as needed. This notion of a single algorithm that can
compute functions of all input lengths is known as uniformity of
computation and hence we think of NAND++ as uniform model of
computation, as opposed to NAND which is a nonuniform model, where we
have to specify a different program for every input length.

Looking ahead, we will see that this uniformity leads to another crucial
difference between NAND++ and NAND programs. NAND++ programs can have
inputs and outputs that are longer than the description of the program
and in particular we can have a NAND++ program that “self replicates” in
the sense that it can print its own code. This notion of “self
replication”, and the related notion of “self reference” is crucial to
many aspects of computation, as well of course to life itself, whether
in the form of digital or biological programs.

For now, what you ought to remember is the following differences between
uniform and non uniform computational models:

Non uniform computational models: Examples are NAND programs
and Boolean circuits. These are models where each individual
program/circuit can compute a finite function
\(F:\{0,1\}^n \rightarrow \{0,1\}^m\). We have seen that every
finite function can be computed by some program/circuit. To
discuss computation of an infinite function
\(F:\{0,1\}^* \rightarrow \{0,1\}^*\) we need to allow a sequence
\(\{ P_n \}_{n\in \N}\) of programs/circuits (one for every input
length), but this does not capture the notion of a single
algorithm to compute the function \(F\).

Uniform computational models: Examples are (standard or
enhanced) NAND++ programs and Turing Machines. These are model
where a single program/machine can take inputs of arbitrary length
and hence compute an infinite function
\(F:\{0,1\}^* \rightarrow \{0,1\}^*\). The number of steps that a
program/machine takes on some input is not a priori bounded in
advance and in particular there is a chance that it will enter into
an infinite loop. Unlike the nonuniform case, we have not shown
that every infinite function can be computed by some NAND++
program/Turing Machine. We will come back to this point in
chapcomputable.

NAND++ programs introduce the notion of loops, and allow us to
capture a single algorithm that can evaluate functions of any input
length.

Enhanced NAND++ programs, which allow control on the index variable
i, are equivalent in power to standard NAND++ programs.

NAND++ programs are also equivalent in power to Turing machines.

Running a NAND++ program for any finite number of steps corresponds
to a NAND program. However, the key feature of NAND++ is that the
number of iterations can depend on the input, rather than being a
fixed upper bound in advance.

Exercises

Most of the exercises have been written in the summer of 2018 and
haven’t yet been fully debugged. While I would prefer people do not post
online solutions to the exercises, I would greatly appreciate if you let
me know of any bugs. You can do so by posting a GitHub
issue about the exercise, and
optionally complement this with an email to me with more details about
the attempted solution.

Complete noabsoluteindexex in the vanilla NAND++ case to give
a full proof that for every standard (i.e., non-enahanced) NAND++
program \(P\) there exists a standard NAND++ program \(P'\) such that \(P'\)
is well formed and \(P'\) is equivalent to \(P\).

Prove that for every \(F:\{0,1\}^* \rightarrow \{0,1\}^*\), the function
\(F\) is computable if and only if the following function
\(G:\{0,1\}^* \rightarrow \{0,1\}\) is computable, where \(G\) is defined as
follows:
\(G(x,i,\sigma) = \begin{cases} F(x)_i & i < |F(x)|, \sigma =0 \\ 1 & i < |F(x)|, \sigma = 1 \\ 0 & i \geq |F(x)| \end{cases}\)

Bibliographical notes

Salil Vadhan proposed the following analytically easier to describe
sequence for NAND++:
\(INDEX(\ell) = \min\{\ell - \floor{\sqrt{\ell}}^2,\ceil{\sqrt{\ell}}^2-\ell\}\)
which has the form
\(0,0,1,1,0,1,2,2,1,0,1,2,3,3,2,1,0,1,2,3,4,4,3,2,1,0,\ldots\).

Further explorations

Some topics related to this chapter that might be accessible to advanced
students include: (to be completed)