Our machine becomes a self-referential
[11]
Gödel machine
by loading it
with a
particular form of
machine-dependent,
self-modifying code . The initial code
at time step 1
includes a (typically sub-optimal)
problem solving subroutine for interacting with
the environment, such as any traditional reinforcement learning
algorithm [20],
and a general proof searcher subroutine
(Section 5)
that systematically makes pairs
(switchprog, proof) (variable substrings of )
until it finds a proof of
a target theorem which essentially states: `the
immediate rewrite of p through current program switchprog
on the given machine
implies higher utility than leaving p as is'. Then it executes
switchprog, which may completely rewrite , including
the proof searcher. Section 3 will explain
details of the necessary
initial axiomatic system
encoded in .
Compare Fig. 1.

Figure:Storage snapshot of a not yet self-improved example Gödel machine,
with the initial software still intact. See text for details.

The Global Optimality Theorem (Theorem 4.1,
Section 4) shows
this self-improvement strategy is not greedy: since the
utility of `leaving as is' implicitly evaluates all possible
alternative switchprogs which an unmodified might find later,
we obtain a globally optimal self-change--the current switchprog
represents the best of all possible relevant self-changes, relative
to the given resource limitations and initial proof search strategy.