Optimal Ordered Problem Solver (OOPS)

Notation.
Unless stated otherwise or obvious,
to simplify notation,
throughout the paper newly introduced variables
are assumed to be integer-valued and to cover the range clear from the context.
Given some finite or infinite countable alphabet
,
let denote the set of finite sequences or strings over ,
where is the empty string. We use the alphabet name's lower case
variant to introduce (possibly variable) strings such as
;
denotes the number of symbols in string , where
;
is the -th symbol of ;
if and
otherwise
(where
).
is the concatenation of and (e.g.,
if and then
).

Consider countable alphabets and .
Strings
represent possible internal
states of a computer; strings
represent code or programs for manipulating states.
We focus on being the set of integers
and
representing a set of
instructions of some programming language (that is,
substrings within states may also encode programs).

is a set of currently unsolved tasks.
Let the variable denote the current state of task ,
with -th component on a computation tape
(think of a separate tape for each task).
For convenience we combine current state and current
code in a single
address space,
introducing negative and positive addresses ranging
from to ,
defining the content of address as
if
and
if
.
All dynamic task-specific data
will be represented at non-positive addresses.
In particular, the current instruction pointer
ip(r) of task
can be found at (possibly variable)
address
.
Furthermore, also encodes
a modifiable probability distribution
on . This variable distribution will be used to
select a new instruction in case points to
the current topmost address right after the end of the current code .

is a variable address that cannot decrease.
Once chosen, the code bias
will remain unchangeable forever -- it is a (possibly empty)
sequence of programs , some of them
prewired by the user, others frozen after previous successful
searches for solutions to previous tasks.
Given , the goal is to solve all tasks ,
by a program that appropriately uses or extends
the current code
.

We will do this in a bias-optimal fashion, that is,
no solution candidate will get much more search time
than it deserves, given some initial probabilistic bias
on program space :

Definition 2.1 (BIAS-OPTIMAL SEARCHERS)
Given is a problem class ,
a search space of solution candidates
(where any problem should have a solution in ),
a task-dependent bias in form of conditional probability
distributions on the candidates ,
and a predefined procedure that creates and tests any given
on any within time (typically unknown in advance).
A searcher is -bias-optimal () if
for any maximal total search time
it is guaranteed to solve any problem
if it has a solution
satisfying
.

Unlike reinforcement learners [4]
and heuristics such as
Genetic Programming [2],
OOPS (section 2.2)
will be -bias-optimal,
where is a small and acceptable number, such as 8.