Overview of main results

In a certain sense to be specified below,
the Optimal Ordered Problem Solver (OOPS)
solves one task after another, always optimally exploiting solutions
to earlier tasks when possible. It can be used for increasingly hard
problems of optimization or prediction. The initial bias is given in
form of a probability distribution on programs for a universal
computer. An asymptotically fastest way of solving a single given task is nonincremental Universal Search
(Levin, 1973, 1984--Levin also attributes similar ideas to Adleman):
In the -th phase
test all programs
with runtime until the task is solved. Now suppose
there is an ordered sequence of tasks, e.g., the -th task is to
find a path through a maze that is shorter than the best found so far.
To reduce the search time for new tasks, our previous incremental
extensions of Universal Search [65]
tried to modify through experience
with earlier tasks--but in a heuristic and non-general and suboptimal
way prone to overfitting.

OOPS, however, does it right. It is
searching in the space of self-delimiting programs
[31,7]
that are immediately executed while being generated: a program grows
by one instruction whenever the execution of the
program so far requests this.
To solve the -th task we sacrifice half
the total search time for testing (through a variant of
Universal Search) programs that
have the most recent successful program as a prefix. The other half
remains for testing fresh programs with arbitrary beginnings. When we
are searching for a universal solver for all tasks in the sequence we have
to time-share the second half (but not the first!) among all tasks .

Storage for the first found program computing a solution to the current
task becomes non-writeable. Programs tested during search for solutions to
later tasks may copy non-writeable code into separate modifiable storage,
to edit it and execute the modified result. Prefixes may also recompute
the probability distribution on their suffixes in arbitrary computable
ways, thus rewriting the search procedure on the suffixes.
This allows for metalearning or metasearching, that is,
searching for faster search procedures.

For realistic limited computers we need efficient backtracking in program
space to reset storage contents modified by tested programs. We introduce
a recursive procedure for doing this in near-optimal fashion.

The method
is essentially only 8 times slower than the theoretically optimal one
(which never assigns to any program a testing time exceeding the program's
probability times the total search time).

OOPS can solve tasks unsolvable by traditional reinforcement
learners and AI planners, such as Towers of Hanoi with 30 disks
(minimal solution size ). In our experiments OOPS
demonstrates incremental learning by reusing previous solutions to
discover a prefix that temporarily rewrites the distribution on its
suffixes, such that Universal Search is accelerated by a factor of
1000. This illustrates the benefits of self-improvement
and metasearching.

We mention several OOPS variants
as well as the self-improving
Gödel machine [59] which
is applicable to general reinforcement learning settings
and may use OOPS as a proof-searching subroutine.
Since OOPS will scale to larger problems in
a way that is near-optimal in a certain sense, we also examine
its physical limitations.

An informed reader familiar with concepts such as universal computers
[72], nonincremental Universal Search (Levin, 1973, 1984),
self-delimiting programs [31,7], and backtracking,
will probably understand the simple basic principles of OOPS
just by carefully reading the present overview. In what follows
we outline the remainder of this paper.