Instructions/Subroutines for
Making & Verifying Axioms & Theorems and for Initiating Online Self-Improvements

Proof techniques written in are composed
of instructions that allow any part of to be read, such as inputs , or
the code of .
The instructions of `higher-level' language
are actually implemented as subroutines running
on the Gödel machine hardware.
They may write on , a part of reserved for
their temporary results.
The output of any tested proof technique is an
incrementally growing proof placed in the
string variable proof stored somewhere in .
proof and are reset to the empty string at
the beginning of each new proof technique test.
Apart from standard arithmetic and function-defining instructions
[38,40] that modify ,
the programming language includes special instructions
for prolonging the current proof by correct theorems,
for deleting previously proven theorems from proof to free storage,
for setting switchprog,
and for checking whether a provably optimal Gödel machine-modifying
program was found and should be executed now.
Below we list all six special instructions for modifying
proof, switchbit, switchprog (there are no others).
Two of them form a non-traditional interface
between syntax and semantics
(Items 4 and 5; marked by `'),
relating the search in the
space of abstract symbol strings representing proofs
to the actual, continually changing Gödel machine state and goal.
The nature of the six proof-modifying instructions below
makes it impossible to insert an
incorrect theorem into proof,
thus trivializing proof verification:

get-axiom(n)
takes as argument an integer (computed
by a prefix of the currently tested proof technique
with the help of arithmetic instructions
such as those used in previous work
[38,40]).
Then it generates
the -th axiom (if it exists, according to the axiom
scheme outlined below)
and appends the axiom as a theorem to the current theorem
sequence in proof. The initial axiom scheme encodes:

Hardware axioms:
A compact axiomatic description of the finite computer hardware
(or the unchangeable software in case the Gödel machine hardware
is emulated by software),
formally specifying how
certain components of (other than the environmental
inputs ) may
change from one cycle to the next.
For example, the following axiom could
describe how some 64-bit hardware's instruction pointer
stored in is continually increased by 64 as long as
there is no overflow and the value
of does not indicate that a jump to some other address
should take place:

Here the semantics of used symbols such
as `(' (open parenthesis) and `' (greater than)
and `' (implies)
are the traditional and obvious ones, while `'
symbolizes a function
translating bitstrings into numbers.

Reward axioms:
Axioms defining the computational costs of any
hardware instruction,
and physical costs of output
actions (if any, e.g., when the are interpreted as robot control
signals).
Related axioms assign values to certain input events which
may encode rewards for desired behavior, or punishment (e.g.,
when a Gödel machine-controlled robot bumps into an obstacle).
Additional axioms define the total value of the Gödel machine's life as a
scalar-valued function of all
rewards (e.g., their sum) and
costs experienced between cycles and , etc.
Example axiom (unexplained symbols carry the obvious meaning):

where is interpreted as the real-valued reward
at time , and as
the cumulative reward between times and .

Environment axioms:
Axioms embodying limited knowledge about the environment,
in particular, axioms restricting the way the environment will produce
new inputs in reaction to sequences of outputs .
For example, it may be known
in advance that the environment is sampled from an unknown probability
distribution that is computable, given the previous history
[47,48,15],
or at least limit-computable [36,37].
Or, more restrictively, the environment
may be some unknown but deterministic computer
program [52,34]
sampled from the Speed Prior [39] which assigns
low probability to environments that are hard to compute by any method.
Or the interface to the environment is Markovian [30],
that is, the current
input always uniquely identifies the environmental state--a lot
of work has been done on this special case [28,2,50].
Even more restrictively, the environment may evolve in completely
predictable fashion known in advance.
All such prior assumptions
are perfectly formalizable.

Uncertainty axioms:Standard axioms for arithmetics and calculus
and probability theory [19] and statistics
and string manipulation that (in conjunction
with the environment axioms) allow for constructing proofs
concerning (possibly uncertain) properties of as well as
bounds on expected remaining lifetime /
costs / rewards,
given some time and certain
hypothetical values for components of etc.
An example theorem saying something about
expected properties of future inputs might look like this:

where
represents a conditional probability
with respect to an axiomatized prior distribution encoded
in the environment axioms (Item 1c).

Initial state axioms:
Information about how
to reconstruct the initial state or parts thereof,
such that the proof searcher
can build proofs including
axioms of the type

Here and in the remainder of the paper
we use bold font in formulas to indicate
syntactic place holders (such as m,n,z)
for symbol strings representing
variables (such as m,n,z)
whose semantics are explained in
the text (in the present context is the
bitstring ).

Note: It is no problem to fully encode
both the hardware description and the initial
hardware-describing software within the software
itself. To see this, observe that some software may
include a program that can print the software.

Since many theorems about the Gödel machine's behavior
will typically be provable even in the absence
of precise initialization information, however,
we do not necessarily insist that the initial state
is fully axiomatized, just as we do not
insist that the rules governing the
environment's behavior are precisely known in advance.

Item
4
will describe an
instruction
that permits the online creation of
theorems closely related to the initialization
axioms above, through subroutines that can
read parts of the current Gödel machine state and subsequently
transform the observations into theorems.

Utility axioms:An axiomatic description of the Gödel machine's overall goal
in the form of a utility function. A typical
`value to go' utility function
(to be maximized)
is of the form
, where
the set of real numbers:

(1)

where
denotes the conditional expectation operator
with respect to some axiomatized prior distribution encoded
in the environment axioms (Item 1c).
Note that is a function of and uniquely identifies the
current cycle,
and that we take into account the possibility of extending
the expected lifespan
through appropriate actions leading to `good' values of .

Alternative utility functions would favor
improvement of worst case instead of expected future performance,
or higher reward intake per time interval etc. See also examples
in Section 3.2.

apply-rule(k, m, n)
takes as arguments the index (if it exists) of
an inference rule (e.g., [9])
such as modus ponens
(stored in a list of possible inference rules
encoded within ) and the indices
of two previously proven theorems (numbered in order of
their creation) in the current proof.
If applicable, the corresponding inference rule is
applied to the addressed theorems
and the resulting theorem appended to proof. Otherwise
the currently tested proof technique is interrupted. This
ensures that proof is never fed with invalid proofs.

set-switchprog(m,n)
replaces switchprog by , provided
that is indeed a non-empty substring of
, the storage writable by proof techniques.

state2theorem(m, n)
translates semantics into syntax.
It takes two integer arguments
and tries to transform the current contents of
into a theorem of the form

where represents a time measured (by checking time)
shortly after state2theorem was invoked,
and the bistring (recall the special
case of Item 1e).
That is, we are willing to accept the time-labeled current
observable contents of any part of as a theorem that does not have
to be proven in an alternative way from, say, the
initial state , because the computation so far
has already demonstrated that the theorem is true.
Thus we may exploit information conveyed
by environmental inputs, and the fact that sometimes
(but not always) the fastest way
to determine the output of a program is to run it.

This non-traditional online interface between syntax and semantics
requires special care though.
We must avoid inconsistent results through
parts of that change while being read.
For example, the present value of a quickly changing
instruction pointer IP (continually updated by the hardware)
may be essentially unreadable in the
sense that the execution of the reading subroutine
itself will already modify IP many times.
For convenience, the (typically limited) hardware could be set up
such that it stores the contents of
fast hardware variables every cycles in
a reserved part of , such that
an appropriate variant of state2theorem() could at least
translate certain recent values of fast variables into theorems.
This, however, will not abolish all problems associated
with self-observations.
For example, the to be read might
also contain the reading procedure's
own, temporary, constantly changing string pointer variables,
etc.3To address such problems on computers with limited
memory, state2theorem
first uses some fixed protocol
to check whether the current is readable
at all or whether it might change if it
were read by the remaining code of state2theorem.
If so, or if are not in the proper range,
then the instruction has no further effect. Otherwise it
appends an observed theorem of the form
to proof.
For example, if the current time is 7770000, then the invocation
of state2theorem(6,9) might return the theorem
,
where
reflects the time needed by state2theorem
to perform the initial check
and to read leading bits off the continually increasing
(reading also costs time) such that
it can be sure that is a recent proper time label following
the start of state2theorem.

check()
translates syntax into semantics. It
verifies whether the goal of the proof search has been reached.
First it tests whether the last theorem (if any) in
has the form of a target theorem. A target theorem states
that given the current axiomatized utility
function (Item 1f), the utility of a
switch from to the current switchprog
would be higher than the utility of continuing the execution
of (which would keep searching for alternative switchprogs).
Target theorems are
bit-encoded symbol strings of the (decoded) form

(2)

where the variable (represented by syntactic place holder )
stands for a time step,
while all other symbols belong to the alphabet of
the theorem-proving calculus, that is,
the set of possible target theorems is parameterized only by .
Here the calculus should permit the notation
as
a shortcut for the state obtained when we replace
the true value of
,
which is part of , by
.
This should facilitate the formulation of theorems
that compare values conditioned on various alternative
hypothetical properties of . (Note that may
be only partially known by the current proof technique
even in deterministic environments where
and switchbit() are
fully predetermined for all valid .)

The purpose of introducing is to deal with hardware-specific
temporal delays that may be involved in checking and switching--it may
take a significant amount of
time to match abstract symbol strings found during proof search to the
Gödel machine's real current state.
If a target theorem has been found, check() uses a
simple prewired subroutine to check
whether there is enough time left
to set switchbit (originally 0) to 1 before
the continually increasing will equal .
If this subroutine
returns a negative result, check() exits.
Otherwise it sets switchbit
(there is no other way of changing switchbit).
Then it repeatedly tests until , to make sure
the condition of formula (2)
was fulfilled at .
Then it transfers control to switchprog
(there is no other way of calling switchprog).
The switchprog
may subsequently rewrite all parts of , excluding hardware-reserved
parts such as and , but including .

With the typical example of formula (1)
the utility of switching to switchprog at a certain time
depends on the remaining expected lifetime.
So the set of possible target theorems (2)
and the expected utility of self-changes may vary over time as
more and more of the lifetime is consumed.
Given the possibly limited axiomatized knowledge about how the
environment will evolve,
proof techniques may reason about the code of check()
described above. They can prove a goal
theorem of the form (2) from
the current, unmodified axioms only if
the potential upcoming
transfer of control to the currentswitchprog provably yields
higher expected cumulative reward
within the resulting expected
remaining lifetime than ignoring switchprog and continuing
the proof search (thus eventually creating and evaluating many
alternative switchprogs). Of course,
this fully takes into account the time needed to
complete the switch to switchprog and to execute switchprog,
which will consume part of the remaining life.
One way a proof technique could start to
infer target theorem (2)
would be to first prove a prediction about parts of at some
time in the near future
(that is, later than the current time), such as

(3)

without predicting the value of switchbit yet,
and a related theorem about effects of alternative values
of switchbit, such as

Clearly, the axiomatic system used by the machine must be
strong enough to permit proofs of target theorems.
In particular, the theory of uncertainty axioms
(Item 1d) must be sufficiently rich.

Note, however, that a proof technique does not necessarily have to compute
the true expected utilities of switching and not
switching--it just needs to determine which is higher.
For example, it may be easy to prove that
speeding up a subroutine of the proof searcher
by a factor of 2 will certainly be worth
the negligible (compared to lifetime ) time needed to
execute the subroutine-changing algorithm, no matter
what's the precise utility of the switch.

delete-theorem(m)
deletes the -th theorem in the currently stored
proof, thus freeing storage such that proof-storing parts of
can be reused and the maximal proof size is not necessarily limited by
storage constraints.
Theorems deleted from proof, however, cannot be addressed any more by
apply-rule to produce further prolongations of proof.