What is a quine? What is this page?

A “quine” (or “selfrep”)
is a computer program which prints its own listing. This may sound
either impossible, or trivial, or completely uninteresting, depending
on your temper and your knowledge of computer science. Actually, it
is possible, and there are some interesting ideas involved (in
particular, writing a quine is not a hack that only works
because the programming language has certain nice properties —
it is a consequence of the general so-called “fixed-point” theorem, itself an
instance of Cantor's ubiquitous diagonal argument).

Quines are so named after the American mathematician and logician
Willard van Orman Quine (1908/06/25–2000/12/25) who introduced
the concept. This page is dedicated to his memory.

I also dedicate this page to Douglas R. Hofstadter, who coined the
name (in his justly famous book Gödel, Escher, Bach) and
who so clearly explained quines' importance and their relation with
Gödel's incompleteness theorem.

Introduction

A quine is a program which prints its own listing. This means that
when the program is run, it must print out precisely those
instructions which the programmer wrote as part of the program
(including, of course, the instructions that do the printing, and the
data used in the printing).

The easiest way to do that, of course, is to seek the source file on
the disk, open it, and print its contents. That may be done, but it
is considered cheating; besides, the program might not know where the
source file is, it may have access to only the compiled data, or the
programming language may simply forbid that sort of operations.

The interesting thing is that writing a quine does not depend on any
kind of hack such as being able to read a source file, or even being
able to represent quotes in several different ways. Any
programming language which is Turing complete, and which is able to
output any string (by a computable function of the string as program
— this is a technical condition that is satisfied in every
programming language in existence) has a quine program (and, in fact,
infinitely many quine programs, and many similar curiosities) as
follows by the fixed-point theorem. Moreover,
the fixed-point theorem is constructive, so the construction of the
quine is merely a matter of patience, not guesswork (or intelligence
as some prefer to call it ;-). This is not to imply, of course, that
actually writing a short or interesting quine may
not demand a lot of cleverness. Still, it says that there is nothing
“magical” behind quines; and also nothing says that they
have to be obfuscated, difficult to read, or devoid of comments, as
they often are.

A first attempt and example

We try writing a quine in C. We choose C because it is widely known,
and also because the printf() function has features which
will make writing a quine considerably easier (this is a mixed
blessing: it is a gain because it makes the quine smaller, but it also
makes it sensibly more obscure and “hackish”).

We will want the quine to be correct C code, so it will probably have
to begin something like this:

#include <stdio.h>
int
main (void)
{

The first thing we want to do is print all what precedes. Naïvely, we
could write:

And so on. It should be obvious that this is not going to work
(except if we intend to produce a quine of infinite length, which we
do not).

This is the sort of reasoning which makes some people believe that
quines don't exist. The problem is that we need to print something,
so we use a character string (say s) to print it, and then
we need to print s itself, so we use another character
string, and so on…

But wait! If we intend to print s, we don't need
another string: we can use s itself. So let's give it
another try:

Well, it still doesn't work. But we have introduced one of the central ideas in quine-writing lore:
whereas it is probably necessary to use some data to represent
the code to be printed, on the other hand it is possible to reuse
these data to print the data themselves. Here we're still a
bit naïve: we're using s “as it stands”, but that won't
work because it contains some backslashes; these would need to be
further backslashified. So we have two paths before us: the King's
way is to proceed with backslashification, which will work because
this is a computable process. However, since we are writing in C, we
choose a shortcut which uses the nice properties of the printf
function:

This is a partial quine: it prints the beginning of its own listing
(something in no way remarkable, since any program which doesn't print
anything is a “partial quine”). Here we have passed the
“catching up point”, by this I mean that the program data
printed includes the data representation itself. It is then generally
trivial to complete the quine (here, things are still a bit tricky
because we've been doing things in a more or less ad hoc
manner, and some of the data are actually hidden in the
printf() statements. Nevertheless, it is not very
difficult to finish:

Here we have a real quine (if you find it obscure, do not worry, much
clearer examples will be given further below). Note the use of the
s2 string to print several lines modeled on the same
pattern. Also note how the backslash required no special treatment.
And note the sx string which goes to show that the
classical belief that everything in a quine must be doubled, is false
(the meaning of the term “intron”, which comes from molecular
biology, will be made clearer below).

This quine is intermediate in elegance: on the one hand it does not
assume that the computer is using an ASCII character set (you see a
lot of C quines which use the fact that double quotes have ASCII code
34 and that line feed has code 10), it is valid ANSI C (with a
warning, however, to the fact that I should have written “const
char *” rather than just “char *”; this is much
better than many quines which omit the return 0 at the
end or similar things), and the longest lines are just 80 characters
(often quines have terribly long lines). On the other hand, the
formatting is inelegant: don't conclude from the above example that
quines need be so badly presented. Also, nothing says you can't have
comments within quines. We will give much more
elegant examples later.

Principles for writing a quine

The basic idea is this:

It is impossible (in most programming languages) for a program to
manipulate itself (i.e. its textual representation — or a
representation from which its textual representation can be easily
derived) directly.

So to make this possible anyway, we write the build the program from
two parts, one which call the code and one which we call the
data. The data represents (the textual form of) the code,
and it is derived in an algorithmic way from it (mostly, by putting
quotation marks around it, but sometimes in a slightly more
complicated way). The code uses the data to print the code (which is
easy because the data represents the code); then it uses the data to
print the data (which is possible because the data is obtained by an
algorithmic transformation from the code).

This idea is summarized by the sentence “quine
‘quine’”. Here, the verb to quine
(invented by Douglas R. Hofstadter) means “to write (a sentence
fragment) a first time, and then to write it a second time, but with
quotation marks around it” (for example, if we quine
“say”, we get “say ‘say’”). Thus,
if we quine “quine”, we get “quine
‘quine’”, so that the sentence “quine
‘quine’” is a quine… In this linguistic
analogy, the verb “to quine”, plays the role of the code,
and “quine” in quotation marks plays the role of the data.

We will henceforth use the words “code” and “data” a lot, to
designate the code and data parts of the quine as just explained.

If we are to take an analogy with cellular
biology (thanks to Douglas Hofstadter again), what I have called
the “code” would be the cell, and the “data” would be the cell's
DNA: the cell is able to create a new cell using the DNA, and this
involves, among other things, replicating the DNA itself. So the DNA
(the data) contains all the necessary information for the replication,
but without the cell (the code), or at least some other code to make
the data live, it is a useless, inert, piece of data.

Note how the data may contain (depending on
how it's interpreted) bits that aren't used to write the code, but are
still copied when the data is written on the output. Such bits are
called introns, in analogy with the parts of the genome
which aren't used to produce proteins. The example we gave above had
an intro (the string sx), clearly marked as such. Quite
obviously an intron can be modified with great ease; it is a kind of
subliminal information that is reproduced with the quine, although it
is not necessary to the quine. The possible existence of introns will
be the key feature making multi-quines (something we will talk
about later) possible.

One word of warning: this code/data distinction in quines is pleasant
and often helpful. It is not, however, completely valid in all
circumstances. Sometimes the code and the data are not well
distinguised, sometimes part of the code plays a data role, or vice
versa. Some quines are far beyond my own modest understanding —
and beyond my feeble attempts at classification and order. As in all
things, caveat emptor. See this
remark later in the text, however.

A second example: added clarity

We now use the principles outlined above to
construct anoter quine, one which will be more elegant in its
formating (but a bit less portable because we will assume an ASCII
coding of characters).

This time, we gather all the data in one place, one array containing
the ASCII values of the characters making up the code, and we place
this array at the beginning of the program. The code will use the
array to first print the array (by printing it as a list of
hexadecimal integers with a proper formatting) and then print the code
(by converting the ASCII values to characters).

This is completely straightforward, and while this quine is far from
the shortest, I think it is the clearest I have ever seen:

This should make it obvious that there is nothing difficult at all in
writing quines. In fact this is the sort of quines we obtain by
directly applying the fixed-point theorem. As
mentioned, the code contains two parts: that which copies the data
(the nine lines following the blank one in the main()
function) and that which uses the data to copy the code (the next two
lines).

Naturally, the coding of the data might be much more complex than a
straightforward ASCII encoding. We will return
to that subject. Also note that here there are no introns, because ASCII does not permit this
(there are no comments or any such things). However, we could
trivially add an intron: create a new array, const unsigned char
intron[], say, put whatever data we want in it, and use the
same printing routines for intron[] as we did for
data[] (of course, we need to modify the code, hence the
data also, to do this, but once it is done, we can put anything in the
intron without modifying anything).

Another point is to be noted: in what precedes I have omitted a great
many lines from the data. Had I not given a pointer to the original
file, could you have reconstructed the data? Evidently, yes, and
without much difficulty: just take the code, take the ASCII value of
each character, and tabulate them. This violates the so-called
Central Dogma, stating that the data must be used (by the
code) to deduce (i.e. to print) the code, but not the converse. In
practice, though, there is nothing wrong with violating the Central
Dogma, in fact, you can guess that I wrote the program by first
writing the quote and then calculating the data; however,
introns cannot be reconstructed in that way (since the very point
about introns is that all possible data will work).

What if a part of the code had been
missing? Then things are much better off. For example, if the
comments had been gobbled, running the program itself would have
restored them (from their value encoded in the data). Even without
any code at all, you would probably have guessed that the data was the
ASCII representation of something and been able to restore the
something in question. But see the section on bootstrapping for more about this.

The fixed-point theorem

I have mentioned the fixed-point theorem and stated that it is at the
heart of the existence of quines. I will now explain what this
theorem states.

(Note that this is just one of very many fixed-point theorems
abounding in mathematics. This has nothing to do, for example, with
Brouwer's fixed-point theorem. I don't know that any specific name is
attached to this one, but I suspect it would be something like
Kleene's fixed-point theorem.)

I assume no familiarity with the theory of computability. However, it
will help: if you are not familiar with it, what I am going to say may
sound a bit vague (but read it anyway, because you probably will grasp
the idea even if the details are obscure).

Before I can state (and prove) the fixed-point theorem, I will recall
some basic notions:

A computable (or “general recursive”) function (of
several integer variables, and with integer values) is one which is
calculated by some program (i.e. some Turing machine operating on the
variables as input, i.e. some algorithm, by whatever definition you
want to take of the word “algorithm” since by the Church-Turing
thesis they're all equivalent). By a partial function we
mean one which is not necessarily defined on all possible values of
the input variables. By a total function we mean one which
is.

We need a standard numbering of programs. This also depends on
the model of computability chosen, but imagine, say, associating with
a program the value obtained by considering the program as the binary
representation of some integer. We write φn(…)
for the result of the n-th program when fed the input
represented by the ellipsis. In particular, any computable function
is equal to φn for some n. We will
not distinguish a program from its associated
number.

The universality theorem states that the
(partial) function φn, considered as a
function of n plus its other values, is itself computable.
In other words, there is a u (a universal Turing
machine if you want, or, in simpler terms, an
interpreter) such that
φu(n…)=φn(…).
So, effectively, this means that you can construct a program
u that will take a program n and some arguments,
and return the value of n applied to the arguments in
question. This means that u is an interpreter, which takes
a program and interprets it, so the universality theorem merely states
the existence of an interpreter (of the programming language
considered, written in the programming language considered). The
universality theorem is a consequence of the Church-Turing thesis,
i.e. our belief that we have grasped all notions of computability.

The s-m-n theorem is essentially the converse of
the universality theorem. It states that if
g(…)=φn(…) is computable then for any
x the function
g(x…)=φn(x…) (obtained by
fixing the value of one of the input parameters and letting the
remaining vary) is computable and is obtained in a computable way
from n. That is, there exists a (computable, total,
and in fact primitive recursive) function s such that
φs(n,x)(…)=φn(x…).
This is really a triviality if the numbering is done right (in fact,
it is almost a definition of the fact that the numbering is done
right): it states that if you have a program n taking some
input, you can (for every x) construct a program
s(n,x) that will act as n except that
it takes x as input; moreover, this other program is
derived algorithmically from the first. So, in effect, you can
substitute a value for an input in a program.

Using the s-m-n theorem and the universality theorem we can prove the
fixed-point theorem. This states that for any computable total
function h there exists an index (a program) n
such that
φn(…)=φh(n)(…).

In plain English, this means that if you have any algorithmic
transformation h on programs then there exists a program
n such that the program n does the same thing as
the program n resulting of the transformation. We will
explain this with further examples in a second, but first we prove the
statement.

For a given program t, we consider the program
s(t,t) (given by the s-m-n theorem). Essentially,
s(t,t) performs what t does when it
is fed itself as input. We further consider the program
h(s(t,t)) which results from the
tranformation h applied to s(t,t).
Now by the universality theorem, there exists an index m
such that
φm(t…)=φh(s(t,t))(…).
In other words, there is a program m which takes a program
t as input, and performs what the program
h(s(t,t)) does. Then I claim that
the program n=s(m,m) is the desired
fixed point. Indeed,
φn(…)=φs(m,m)(…).
But by definition of s, this is
φm(m…), which in turn, by definition of
m, is
φh(s(m,m))(…)=φh(n)(…),
quod erat demonstrandum.

To summarize the proof, we have taken the program m which,
given a program t, interprets the program resulting of
applying the given transformation h to t acting
on itself, and we have applied that program to itself.

How does the fixed-point theorem prove the existence of quines? This
is very simple: for a given program t, consider the program
h(t) that prints the listing of t.
Obviously this h is computable. Now the fixed-point
theorem tells us that there is a program n such that
h(n) and n do the same thing,
i.e. printing the listing of n. So n prints the
listing of n.

In practice, how do we construct n? Well, the proof of the
fixed-point theorem answers this question as well. Since the proof
used the universality theorem, it may seem like we need to construct
an interpreter to apply the theorem. In fact, we need not: indeed, if
you look closely at the proof, you will see that we used the
universality theorem only for programs of the form h(…),
so that we need only construct an interpreter for those programs; for
our particular choice of h, this is trivial.

So consider a program t, taking an argument. We will
assume that this argument is given as a variable data to
be linked with the program. Then s(t,t) is the
program obtained by setting this variable data to the
textual value of the program (as a string, say, or as whatever coding
we have chosen). Our program m takes an argument
t (in the form of the data variable) and
performs what h(s(t,t)) does, i.e. it
prints the listing of s(t,t), which is none
other than the listing of t with a definition of the
variable data to be the text of t. And
finally for our program n we take
s(m,m), that is, we take this program
m and link the variable data to be the text of
the program. Quite evidently, this is precisely what we have been
doing.

The fixed-point theorem has other amusing applications. Essentially,
its intuitive (and effective) content is that a program may use its
own source as a variable, i.e. adding to a programming language the
ability for a program to manipulate itself (its source code) does
not add to its expressive power. So there exists a program
that compresses its own listing; there exists one which prints its own
MD5 checksum (this is much easier than finding a program
— indeed any file — that contains its MD5
checksum; still, someone I know thought it was impossible except by
brute force — how rude — so I wrote such a
program and won a bet like that); there exists a program that
prints a second, different, program, that prints the first one again
(here, h(t) would merely be a program that
prints a lot of print calls for the various lines of t's
listing); and so on.

(A passing note, which you may find a bit
difficult to understand if you're not used to computability theory.)
A different, perhaps more satisfactory, way of
stating the fixed-point theorem would be to eliminate the universality
theorem from it, and to say: for every computable function
k there exists a n such that
φn(…)=k(n…). This
corresponds more precisely to the intuitive content we have described.
It is proved without the use of the universality theorem,
using only the s-m-n theorem (for the actual proof, take the proof we
have just given, and replace
φh(x)(…) by
k(x…) everywhere). The advantage of
formulating things like this is we see that it also works for
primitive recursive functions (which satisfy s-m-n but not
universality), so in effect a primitive recursive function can also
make use of its own number. By applying the universality theorem (the
function φh(x)(…) is
computable, so we can call it k(x…)) we
recover the fixed-point theorem as we have stated it. The examples we
have given of the fixed-point theorem actually use the more restrictive
(non-universal) we have just stated. The following examples will use
universality (and don't work for primitive recursive functions, which
is clear because primitive recursive functions always
terminate).

There also exists a program that interprets its own listing:
we will return to this. Also, if we take
for h the function which to a program x
associates the program which calculates what x does, and,
at the end (provided x terminates, of course) adds 1, we
would have a program x which does the same thing as running
x and adding 1 to the result, and that is only possible if
x does not terminate, so that the fixed-point theorem also
proves the existence of an endless loop.

Exercice: Louis Reasoner believes that the
fixed-point theorem proves the existence of polyglot programs
(i.e. programs that are valid and do the same thing in several
different programming languages). His argument is this: for a given
program t (in a first programming language) consider a
translation of t in a second programming language, and
interpret this program literally in the first language,
giving h(t). By the fixed-point theorem, there
exists n such that h(n) and
n have the same effect, i.e. the text of the program
h(n) has the same effect in the first language
(that is h(n)) and in the second (that is
n). What do you think of this argument?

The fixed-point theorem gives a different point of view on quines from
the one we have given so far. The ideas we
have already expressed, notably the code/data dichotomy, are perhaps
not very clearly apparent. Still, they are present: we should
consider the s function from the s-m-n theorem as a mean of adding
data to a program (which would otherwise receive this data as an
input), so the expression s(m,m) which we have
seen says, in effect, add to the program m (the code) a
representation of the program m itself (the data). Introns can exist because the function s is
free to add extra data to the data required of it, if it wants.

Multi-quines: making use of
introns

We start by saying what a bi-quine (or more generally a multi-quine)
is. To begin, here is what it is not: a bi-quine is
not a program which prints a second program, which in turn
prints the first again (actually, it is that, but things are a bit
more subtle). This is too easy to do (we have proved the existence of
such using the fixed-point theorem): one program
is almost a quine, and the other is merely a sequence of calls to
print the code of the other one.

A multi-quine is also not a polyglot quine (a quine that can
be read, and is a quine, in several different languages). True,
polyglot quines actually are multi-quines if you think well about it
(the converse is not true), but polyglot quines don't exist for every
combination of programming languages (although it is true that some
people have been incredibly smart at constructing them) whereas
multi-quines do — polyglot quines are a hack whereas
multi-quines are a general phenomenon.

A bi-quine is a very interesting kind of program: when run
normally, it is a quine. But if it called with a particular command
line argument, it will print a different program, its “brother”.
Its brother is also a quine, but in a different programming
language, so its brother prints its own listing when run normally.
But when run with a particular command line argument, the brother
prints the listing of the original program. So in effect, a bi-quine
is a set of two programs each of which is able to print either of the
two. More generally, a multi-quine is a set of r different
programs (in r different languages — without this
condition we could take them all equal to a single quine), each of
which is able to print any of the r programs (including
itself) according to the command line argument it is passed. (Note
that cheating is not allowed: the command line arguments must
not be too long — passing the full text of a program is
considered cheating ;-).

There are several ways to prove the existence of multi-quines using
fixed-point theorems. Here is one (we leave it to the reader to fill
in the missing details). We just consider the case of a bi-quine,
i.e. r=2. We consider, in language 1, a program of two
parameters that will normally print the first, but that will print the
second if a special argument is passed to it. By the fixed-point
theorem, we can assume that the first text is its own listing, so that
we get a program of one parameter that will print its own listing
except that it will print the parameter if called with a special
argument. Do the same for language 2. We now have two programs.
Substitute one in the other: there is a program, of one parameter, in
language 1, that will print its own listing, except when it is called
with a special argument, in which case it will print a program, in
language 2, which prints its own listing except when it is called with
a special argument, in which case it will print the initial parameter
(passed to the first program). Finally, apply the fixed-point theorem
to that. Voilà, we have the bi-quine.

So, to create multi-quines, we make use of introns (following,
essentially, the proof given just above). We have r
programs, so r code sets (one in each language); besides,
each of the r programs has, in addition to its
code set, r data sets, one representing each of
the r code sets (so r-1 of the data sets are
introns as far as the quine structure goes) in a given coding (in
principle it would be possible for each of the
r2 data sets to use a different coding, but
there is no reason to use a different coding for various data sets in
the same program, and even between programs it is reasonable to use
more or less similar codings, at least insofar as the programming
languages allow this). When program i (running code set
i in language i) is asked to produce the listing
of program j, it will use its j-th data set to
produce the j-th code set, and then it will use
all of its r data sets to produce the r
data sets of program j (coded in the same or in a similar
way).

In practice, we write a quine program similar, say, to the second example we have given on this page, to
which we add an intron. Using this intron, the quine is able, when
passed a particular parameter, to produce a representation (valid in
the second programming language) of the two data sets (the actual data
of the quine and the intron) followed by some data specified by the
intron. Then we do the same in the other programming language, with
the data representation we have elected to produce (and the second
program, when passed the special argument, must produce data
representation as we have used in the first program). Finally, we
synchronize the introns: we use the intron of the first program to
represent the code of the second program and the intron of the second
to represent the code of the first. (Remember, the nice thing about
introns is that we can change them after the quine has been
written, without removing its quinishness.)

If you would feel more comfortable with an example, I have written a
C/Perl bi-quine. (For
fun, I only give out the C version: if you want the Perl version you
will have to run the program with the magic
word as argument.) In the C version, c_data is the
main data set and perl_data is an intron; in the Perl
version, of course, things are reversed. (The coding is not quite the
same, also, although both are hexadecimal.)

Bootstrapping: recovering the code from
the data

As we have already explained and illustrated,
a quine is basically a bunch of data, plus an active part, the code,
which reads the data twice: once to reproduce the data, and once to
reproduce the code; the data represents the code, and the code
interprets that representation and recovers the code. There are two
parts in the code: that which uses the data to copy the data and that
which uses the data to copy the code.

Now what if we are given only the data part of the quine? In
the analogy I have given with cellular
biology, this is the equivalent of having the DNA (the genome is what
I have been calling the “data”… ugh) and wanting to reconstruct a
cell.

Well, it is a matter of how difficult the coding (another word to
beware) is. If I give you the following quine fragment (the data
part):

you probably won't have much trouble recovering the complete quine. This is because
the representation chosen here is completely trivial. We can proceed
as follows: just run the tiny instruction printf ("%s",
data); on the above data and you get the code; put the code and
the data together, and you get a first program which is almost the
quine (it may differ in inessential factors, for example if you put
the data after the code rather than before); but this program will
produce the original quine when run. This process is called
bootstrapping, and it is similar to the process of
bootstrapping, say, a C compiler (you start with an initial C
compiler, which may be much simpler, much less featureful, or much
less efficient, than the C compiler you want to build, and you run it
on the sources of the desired C compiler, giving a first binary C
compiler, which you use a second time to recompile its own sources).

The possibility of bootstrapping means that to some extent quines are
self-healing: if the code is damaged but still able to use the data to
recover the original code, bootstrapping can be performed.

However, nothing says a quine must use a simple
coding like ASCII. I have written a quine that stores, in its
data, a compressed (gzipped) representation of the code. This means
that whereas the code that uses the data to produce the data is
trivial (it is the same as that used in our previous example), on the other hand the code
that uses the data to produce the code is much more involved, because
it must actually uncompress the data. (The gzip format is very
strange and very unpleasant to uncompress. I have written a set of
routines to decode it, which are included in the quine of course, and
which I put in the public domain if they can be useful to anyone.)
Here, the gzip program (plus a bit of interpreting the data as binary)
could serve to bootstrap.

you will have no trouble recovering the original program if you have
a little bit of geek culture, but you probably get my point anyway.

In fact, let us take an extreme example: I
have written a quine
that stores its code enciphered with the blowfish cryptographic
algorithm (by Bruce Schneier) in its data. Of course, the key is part
of the code (without the key, the data is useless). Moreover, I have
added an intron to the program, which is
encrypted with the same key. When the program is run with the magic
word as argument, it deciphers (and prints) the intron rather than
printing its own listing. This has an amusing consequence: if the key
is removed from the listing, then practically nothing is missing from
the code, and yet it is impossible to bootstrap; even though we have
most of the plain code, the complete ciphered data and secret, we
can't do much with it because all is locked by a key (and blowfish is
not known to be vulnerable to a known-plaintext attack). In fact, the
situation is even more ironic than that since the key is present in
the crypted data: we are, essentially, in the situation of someone
locked outside his home with the key inside.

(Note that in writing this quine I have implemented the blowfish
encryption and decryption algorithm — in fact, the quine
contains the full functions, far more than are necessary for what it
does. I put these functions in the public domain: you can find them
here
without the quine part. Be careful: although I am using
this just for fun, this is nevertheless strong crypto. So be careful
about your local crypto laws.)

A point might be made here about the
distinction between code and data: here I claim that the key is part
of the code and not the data. The difference is not
so much in how the key is used as in how it is stored. In fact, if
the key is in the code (as in my quine) the program's skeleton is
basically this:

and as explained, if the key is removed, it is “locked inside the
house”. However, if we had some magical way of deciphering blowfish,
we could recover the key (even if our magical method did not let us do
this a priori) because it is part of the code, so it
is stored among the encrypted data. On the other hand, if the key is
data, the program looks like this:

This may not appear very different, but it is. This time, there isn't
a copy of the key “inside the house”. The key is part of the data,
it is the only part of the data that is stored in clear. I think
there is something to this idea of distinguishing the “code” and
“data” parts of a quine not by what they are used for but
how they are printed.

While it is true that some parts of the code can be recovered by a
bootstrapping process, on the other hand, the data can never
be recovered in that way. Any part of a quine which, if it is
modified, does not change the program output (meaning that the program
output is still the original quine), is notdata, it
is code. (This applies, for example, to the comments inside
the data section of the program.) (Well, all right, I guess there
is room for discussion.)

However, the data contains parts of a different nature: when they are
modified, the output produced by the program is modified, but
it remains a quine. Those are the introns we have already much talked about. In
a way, introns represent the exact opposite of the principle of
bootstrapping: in the case of bootstrapping, we hope that after a
certain number of iterations we will hit the original program again;
but if we modify an intron, the program remains a quine, so it will
not “heal” itself, it will just remain in its modified form.

Recapitulation

I have been introducing a great many names and concepts. I will
summarize them here.

A multi-quine is a
collection of several quines, each one of which is able to print
either its own listing or any of the other ones.

The code section of a quine is
that which uses the data to print the program; it is printed by
interpreting the data section (which may imply unlocking it with a key
or some complicated operation of the sort).

The data section of a quine is
that which represents the code section. It is derived from the
textual form of the code, and the code's role is to perform this
operation backward; the data is printed by reading the data and
representing it in a more or less trivial fashion (for example,
tabulating it in hexadecimal).

An intron is a part of the
data section of a quine which can be modified in such a way that the
program remain a quine (in other words: it is modified and
the output produced by the program changes so as to follow the data
modification).

Irrelevant code is a part of
the quine's code section which can be modified (or removed) so that
the program still produces the same output (ergo the
original quine). In other words, bootstrapping the quine
will heal the irrelevant code.

Key code is a part of the
quine's code section which cannot be modified at all (if it is
modified, the program either is not correct, or does not function, or
produces gibberish; this is in contrast with an intron which if
modified does not make the quine any less quinish, or irrelevant code
which if modified still produces the same program).

Bootstrapping is the
operation of running one or more times a modified version of a quine
to recover the original quine. For example, a quine can be
boostrapped from the knowledge of its data section and of some code
that will perform the function of the key code. This is a “healing”
process that will recover the irrelevant code.

There are analogies with compilers (or interpreters) of course. An
intron within a compiler would be something that cannot be
bootstrapped, essentially because the compiler (or interpreter) merely
copies the behavior of the underlying system (compiler) to itself.
This is what Ken Thompson explains (he gives the example of
\v in C) in his Turing Award speech
quoted in the links section below.
Irrelevant code differences are differences between two
compilers which perform the same task (i.e. output the same binaries)
but in a different way (their binaries are different), for
example the same compiler compiled with two different compilers; then
we can do a bootstrapping, i.e. recompile the compiler and
obtain the “fixed-point” version.

Self-interpretation: using data as
code

In this section I must give my examples in Scheme rather than in C
because Scheme permits the manipulation of programs (meta-expressions)
as data (symbolic expressions).

Consider the two following elegant Scheme quine programs. First this
one:

The first one is easy enough to understand, and follows the usual
pattern well: the five lines ending with the second-to-last are the
“data” (as well as the two character strings, I suppose), and the
rest is the “code”. The code (the do function
essentially) uses the data (the l variable essentially)
to print the code (the first (d l)) and then print the
data (the second (d l)).

But the second example is a bit strange: evidently the x
variable (the lines from the second to the eight) is data. The code,
essentially, is limited to the single instruction (map eval
x). If you are unfamiliar with Scheme, this means: “consider
x as a list of Scheme instructions and execute them”.
So what we are doing here is using the data, in effect, as code. This
is curious because the whole point of a quine, really is to use
code as data and here we are using data as
code. But in a way it makes sense: if you consider
x to be written in a programming language which is just
like Scheme except that the code can be accessed as data…
through the variable x! Then x's rôle is to
print x itself plus the “interpreter” ((map eval
x)).

I have also written a quine
in Bourne shell along the same principles. It is rather subtle to
understand, but I think it is worth the trouble. If you prefer the
“dc” programming languages, the compare this quine (along the lines
of the first Scheme program above, i.e. the “normal” lines) and that one (which also uses
the data-as-code principle and it is shorter).

I'm not entirely sure whether this way of writing quines is actually
qualitatively different from the “normal” way.
(For example, do they correspond to a different proof of the fixed-point theorem, perhaps one that uses one more
time the universality theorem — I can manufacture such a proof
but it is not really convincing.) It is true that if we compare the
two Scheme programs, or the two dc programs, given above, there seems
to be an important difference (namely, that there is much more
redundancy in the first shan in the second). But maybe that is just a
naïve way of thinking. Still, I can't help but think there is some
relation with the two ways of writing the Curry Y (fixed-point)
combinator: as λf.((λx.(f(xx)))(λx.(f(xx)))) or
as λf.((λx.(xx))(λx.(f(xx)))). But maybe I'm
gone totally off my rocker there.

To conclude this section, I'd like to
mention one program I wrote
that I'm particularly fond of. It is not a quine and it is in no way
so impressive; but in fact it was considerably more difficult to write
than a quine. It consists of a (rather minimal) Scheme interpreter,
written in Scheme. And that interpreter is applied to itself acting
upon itself. So it is a Scheme interpreter trying to interpret a
Scheme interpreter interpreting a Scheme interpreter
interpreting… well, you get the picture. As each interpreter
prints some debugging information about the program it is
interpreting, this leads to a lot of output data (with curious
properties; for example, search for the string “Now starting
evaluation…” without quotes around it, and see how it
becomes logarithmically rarer and rarer). If you have read the cryptic comment I have made a while back on the
use of the universality theorem in the fixed-point theorem, this is a
case were we need the universality theorem, and indeed, it is
the central part of our program (writing an interpreter). You should
also note the analogy with Gödel's theorem, because this
self-interpreting program is much closer to Gödel's theorem than
ordinary quines. Naturally, if we allow the use of the
eval function (but that's cheating), we can rewrite my
program in a much simpler way:

Conclusion

Well, I've written much more than I intended to. I wanted to make
this a small page on The Art Of Quine Programming, and it turned out
to be quine (oh, what a strange slip! I meant “quite” of course) a
monument.

I haven't given enormously many examples, but I hope the examples I've
given were clear enough so that, if you didn't know how to write
quines initially, now you do (even if you didn't understand all that's
on this page). If you want more examples, have a look at my personal quines collection (all
written by yours truly), which you can also access by FTP,
or download as a single tarball.
Also look at some of the links below, where a
great number of more quines can be found.

Yow! I've just lost the SOURCE CODE for all my QUINE PROGRAMS! What
will I DO NOW with just the BINARIES?