Metaphysics, Metamathematics and Metabiology
A mathematical analysis of the scientific method, the axiomatic method,
and Darwin's theory of evolution

G. J. Chaitin, IBM Research[gjchaitin_at_gmail.com]

In Memoriam Jacob T. "Jack" Schwartz (1930-2009)

[Outline of a course given August 2009 in the Programa de História das Ciências e das
Técnicas e Epistemologia (HCTE) at the Universidade Federal do Rio de Janeiro (UFRJ),
to be published as a book in Spanish by Midas (Valparaíso).
This was a three-week course, with two two-hour lectures each week.
There are links to handouts and also additional reading given for each topic.]

Lecture 2 — Ontology: The Labyrinth of the Continuum

Is the Universe Discrete? Quantum Gravity, Thermodynamics of Black Holes, and the Holographic Principle (Hawking, Bekenstein, Susskind, t'Hooft) suggest that a physical system can
contain at most a finite number of bits of information, which in fact grows
as the surface area of the boundary of the system, not as the volume of the physical system.
[So if this is correct, then three-dimensional physical systems are in a sense actually
two-dimensional, whence the term "holographic" principle.]

AIT's discrete model of scientific theories works better if the universe is discrete.

In addition to physical arguments against continuity and in favor of discreteness,
there are also mathematical arguments, basically because real numbers contain
an infinite amount of information.
These mathematical arguments may be regarded as modern versions of Zeno's paradoxes
(the paradox that Achilles cannot catch the tortoise, and of the immobile arrow in flight).

[Hermann Weyl, who was the first to discover the importance of Leibniz's ideas on complexity in the Discours de métaphysique,
in the same 1932 book where he presents this, The Open World, gives a
modern Zeno argument. If you really believe that time is infinitely divisible, then
you can perform an infinite number of computations by performing the first step of the computation in 1/2
a second, the second step in 1/4 of a second, the third step in
1/8 of a second, and so forth and so on, so that an infinite number of steps are completed in exactly 1 second! And nobody thinks this is possible. I began studying Leibniz after reading Weyl's The Open World.]

Three [Four!] Proofs
of the Existence of Uncomputable Real Numbers:

Borel's 1927 know-it-all, oracle real:
You need to number all possible Yes/No questions and then you use the Nth digit of a real number
to answer the Nth question.

[Give Borel's simple original version in which each digit answers one question, and also
the more complicated rigorous version in which the Nth digit tells us about the Nth string of symbols in the English alphabet (as
given in my Conversations with a Mathematician).
In the rigorous version of Borel's number,
the Nth digit tells us if the Nth text is a syntactically valid English text, then if
it is a Yes/No question, then if it has an answer ("Is the answer to this question No?" has
no Yes/No answer), then whether the answer is Yes or No. This avoids the problem
with Borel's original formulation of having to be able to enumerate all possible Yes/No questions
(some of which don't even have an answer). It is much easier to enumerate all the
possible finite texts that can be written using a finite set of characters.]

Turing's 1936 proof:
(Cantor's diagonal method applied to the list of all computable reals)
By construction, the
Nth digit of Turing's uncomputable real is different from the Nth digit of the Nth computable real.

[Borel's 1927 oracle real is much more uncomputable than the uncomputable real
exhibited by Turing in 1936 using Cantor's diagonal method! But Borel does not
give a formal definition of what a computable real number is. That was first done by
Turing in 1936. Borel and Turing unfortunately seemed unaware of each other's work.]

Probabilistic Proof:(Proof in the spirit of non-denumerability of the continuum argument
in Courant and Robbins, What is Mathematics?)

Cover the Nth computable real with an interval of size ε/2N,
giving a covering of all computable reals with total size ≤ ε × (1/2 +
1/4 + 1/8 + ...) = ε,
which can be made as small as desired by taking ε sufficiently small.

Reals are uncomputable with probability one!

Most real numbers are unreal!

[Indeed, Borel, Les nombres inaccesibles, 1952, points out that with probability one,
a real cannot even be named uniquely (let alone computed).]

Fourth Proof:
[We will encounter one more uncomputable real number next week, namely the halting probability Ω. It is maximally uncomputable, maximally unknowable.]

Additional Reading:

Nadler, The Best of All Possible Worlds: A Story of Philosophers, God and Evil,
pp. 132-133

Smolin, Three Roads to Quantum Gravity

Chaitin, Thinking about Gödel and Turing: Essays on Complexity, 1970-2007
[Contains the first three handouts and much more besides.]

Second Week: Metamathematics

Lecture 3 — Limitations of Formal Axiomatic Theories

Hilbert on formal axiomatic theories and his belief in a TOE (Theory of Everything) for pure mathematics:
Mechanical procedure for checking proofs.
Absolute certainty! Truth is objective, not subjective.

Important Note:
Running through all possible proofs and checking which ones are correct gives you a mechanical procedure for obtaining all
the theorems in your theory!
So to determine the truth of any assertion A, just calculate until you find a proof of A or
a proof that not A!
This goes back to the medieval search for the perfect language giving access to absolute
truth (and magical powers!) recounted in Umberto Eco's book The Search for the Perfect Language, including Ramon Llull's mechanical combinations of
concepts and Leibniz's characteristica universalis in which an error of reasoning is
an error in calculation.

Let's measure the complexity of a formal axiomatic theory by the size in bits of the
smallest program for generating all the theorems. An N-bit theory is one that requires
an N-bit program to generate all its theorems.

You can't prove that a program is elegant!

More precisely,
you need an N-bit theory in order to be able to prove that an N-bit program is elegant.

Therefore a formal axiomatic theory can only prove that finitely many programs are
elegant.

[An elegant program is the best theory for its output. More precisely, a program is elegant if no smaller program
written in the same language produces the same output. For every possible output there is at least one
elegant program, therefore there are infinitely many elegant programs.]

Proof: Consider the paradoxical program P that produces the same output as the
first provably elegant program Q with the property that Q is larger in size, has more bits
than P itself does. Q cannot exist, for otherwise we get a contradiction.

First corollary: The unsolvability of the Halting Problem.

[Question of whether a self-contained, inputless program will calculate forever
or will eventually stop.]

Second corollary: The Busy Beaver function grows faster than any computable function.

[Also give direct proof of this. We will use the BB function next week.]

[The BB function of N is the largest positive integer you can name using a program having
N bits or less.]

Philosophical and practical consequences: No TOE (Theory of Everything) for pure
mathematics, No absolute truth, No certainty, A "quasi-empirical" view of math [Lakatos; heuristic proofs; math is not equal to physics, but perhaps it is closer to physics than most people think], Experimental mathematics.

[Another ironical fact: Although there is no "perfect language" for mathematical
reasoning (even though Zermelo-Fraenkel set theory is very popular), there are
universal languages for computing, for calculating. Instead of formal systems enabling us to prove all truths, we have found formal languages for expressing all possible algorithms. This is where Hilbert's formalist
dream has been most successful, not where he expected that it would be.
It became computer technology instead of providing mathematics with absolute
certainty. Indeed, I would argue that the best, most perfect programming languages
are the concise, optimal universal Turing machines used in AIT, because the
fact that U(πC p) = C(p) means that these languages are
maximally expressive.
Furthermore, viewed from the perspective of the Middle Ages, programming languages do
give us the God-like power to breathe life into (some) inanimate matter:
hardware ≈ body, software ≈ soul!]

Lecture 4 — The Halting Probability Ω

Go from Borel's 1927 know-it-all real, to a real whose Nth bit solves the halting problem
for the Nth computer program (which is an extremely redundant oracle for the halting
problem), to the halting probability Ω (which is an extremely concise oracle for
the halting problem).
Key idea: Given N programs, can tell which ones halt if know how many of them halt, and this
is about log2 N bits of information, not N bits of information.
As before, our universal Turing machine must have suitably concise, succinct programs:
U(πC p) = C(p).
Furthermore, now programs must be self-delimiting.
In other words, programs are halves, quarters, eighths, etc. of the unit interval, so the total
measure assigned to all programs is ≤ 1 and we can talk about the probability that
programs do something.

If you could know the first N bits of the numerical value of Ω, that
would enable you to solve the halting problem for all programs up to N bits in size.

Ω is not just uncomputable, it is maximally uncomputable, completely incompressible.

Ω shows that there is
infinite irreducible complexity in pure mathematics.

You need an N-bit theory in order to be able to determine N bits of Ω.

The bits of Ω are necessary truths, but they appear to be contingent, accidental truths,
they appear to be true for no reason, no reason simpler than themselves.
Thus they refute Leibniz's principle of sufficient reason, which states that anything that's true
is true for a reason.

Ω can be viewed pessimistically, as a limit to knowledge, but it can also be
viewed optimistically, as a proof that mathematics is not mechanical and requires
creativity. Indeed, in a sense
Ω is concentrated, distilled, compressed mathematical creativity. And we can measure the power of a formal axiomatic theory
by how many bits of Ω it can determine.

[A similar measure of the power of a formal
axiomatic theory: the size in bits of the smallest (self-contained, inputless) program for which it can neither be proved that it halts nor that it fails to halt.
A formal axiomatic theory can easily
verify that every program that halts does so. And it can also prove in infinitely many cases
that programs will never halt. But there will always be infinitely many programs that fail
to halt for which our formal axiomatic theory cannot enable us to prove this.]

Pure Mathematics versus Physics & Biology:
Math has infinite complexity and in this precise sense is therefore closer to biology than to physics, where there is still hope of a TOE, a simple set of equations for everything.

[Even though in the previous lecture we suggested that perhaps math should be
done a little bit more like we do theoretical physics.]

[This leads us right into biology and Week 3, the attempt to give a mathematical proof that
Darwin's theory of evolution works, that the complexity of organisms will increase.]

Additional Reading:

Tymoczko, New Directions in the Philosophy of Mathematics
[A defense of quasi-empiricism.]

Borwein and Devlin, The Computer as Crucible: An Introduction to Experimental Mathematics

Wolfram, A New Kind of Science
[A massive piece of experimental mathematics: the systematic exploration of
the computational universe.]

Chaitin, Meta Math! The Quest for Ω
[A general reference for the first two weeks of this course, with more on the history of Ω.]

Also in Portuguese:

Eco, The Search for the Perfect Language
[To be read together with Meta Math!:
The prehistory of the quest for absolute knowledge.]

Population of one organism, which is a program that calculates a positive integer

Try mutations at random, use oracle for halting problem to avoid ones that never halt,
pick organism that produces the largest positive integer (BB problem!)

I can prove that the functional complexity will increase indefinitely.
This is easy to do, too easy, in fact.
On the positive side, the useful complexity increases; this is not just software bloat.
We are not just getting the complexity to increase by adding junk to the program.
But I do not know
how to prove that ontogeny recapitulates phylogeny, as it were, even though I conjecture this is
usually the case. I think we need to do that in order for this approach to have any biological relevance or significance. In my proof that the functional complexity will increase,
evolution just starts over, it is not cumulative, which does not seem at all biological.

Behavior of model depends on choice of programming language for digital organisms,
whether mutations are at a high-level or machine language bit level, and on the
probability distribution of the possible mutations.

Some sort of model like this should work. There is much to be done!
I think this randomly mutating software approach does make Darwinian evolution
more understandable.

I envision metabiology as a field parallel to biology, dealing with the random
evolution of artificial software (computer programs) rather than
natural software (DNA), and simple enough that it is possible to prove rigorous
theorems or formulate heuristic arguments at the same high level of precision that is
common in theoretical physics.

Additional Reading:

Shubin, Your Inner Fish: A Journey into the 3.5-Billion-Year History of the Human Body
[How ontogeny recapitulates phylogeny.]

[Additional Things to Think About]

Can compare evolvability of two programming languages A, B with the same source
of mutations using largest positive integer fitness function:
Language A has greater evolvability than language B if with high probability
random mutations make positive integer that is calculated grow at faster rate
when using language A instead of language B.

A key point in evolutionary theory is the existence of paths leading from low
complexity to high complexity organisms passing through a series of small evolutionary
steps each of which is slightly advantageous. In other words, there must be
a chain of plausible intermediate forms.

Example of an interesting path in software space:
Organisms ON consist of a prefix followed by the first N bits of Ω (N = 1, 2, 3, ...) [reversed].
Prefix sees how long it takes the computable lower bounds on Ω to get
the first N bits right. This is the smallest K such that (the sum of 1/2 raised to the size in bits of
each program ≤ K bits in size that halts in ≤ K seconds) reproduces the N bits of Ω that we are given. These organisms ON are a very small mutation distance from
each other [md(ON, ON+1) is small], and the integer each one calculates grows faster than any computable
function of N. [Here we are using the programming language and mutation model in Handout #6.]

[[[Here is a better version of this. As before, organism ON consists of a prefix
followed by the first N bits of Ω (N = 1, 2, 3, ...) [reversed].
This time the prefix π uses these bits of Ω in the standard way to determine each
self-delimiting program |p| ≤ N that halts, and then the greatest positive integer K
with program-size complexity H(K) ≤ N,
which grows faster than any computable function of N.
If we can protect this prefix π from mutations, there is fixed-size mutation-distance neighborhood
{X: md(X, ON) ≤ C}
of each of these organisms in which the fittest organism will be another organism in this sequence
ON+C' that
calculates a bigger positive integer. Assuming we have an oracle for the halting problem,
this gives us a deterministic evolutionary algorithm instead of a random walk model as in Lecture 6 above.
In effect, we are calculating Ω bit by bit.

This alternative model exhibits historicity,
i.e., organisms contain their evolutionary history. But it is unsatisfactory from a biological
point of view because there are no random mutations and because our software organisms do not exhibit the deeply-nested
hierarchical structure that is such a conspicuous feature of biological organisms.
Whether organisms can shield part of their DNA from mutations is also a relevant issue,
although allowing this as an option is certainly possible when we are dealing with
artificial software organisms. If DNA can be shielded when we have random mutations,
then we cannot expect to prove that evolution occurs with probability one, only that it
occurs with positive probability. [The random walk model in Lecture 6 evolves with probability one.]

Perhaps this example shows that evolution is too easy if we have an oracle for the halting problem?
How about a version of my random-walk model for evolving programs which name large positive integers
that works without an oracle?
Maybe then we will get the desired deeply-nested hierarchical structure?
Is there any connection with primitive recursive functions and subrecursive hierarchies?
With work by Rózsa Péter?
Must think about this.]]]

Call an infinite sequence of software organisms OK (K = 1, 2, 3, ...)
an evolutionary history if the
fitness function (positive integer that is output) grows and the mutation distance
between successive organisms md(OK, OK+1) is bounded.
Note that the optimal universal Turing machines used in AIT have the property
that evolutionary histories are preserved when a fixed prefix πC is
added at the beginning of each organism OK in the evolutionary history,
because outputs are the same and mutation distances are still bounded.
More precisely, a universal Turing machine U has the most evolutionary histories,
because an evolutionary
history for any Turing machine C will also be an evolutionary history for U.
And any two universal Turing machines U and U' will have exactly the same set of evolutionary histories.
[Again, here we are using the programming language and mutation model in Handout #6.]