Other Work on Incremental Learning

Since the early attempts of [37] at building a ``General Problem
Solver''--see also [41]--much work has been done to develop
mostly heuristic machine learning
algorithms that solve new problems based on experience with
previous problems, by incrementally shifting the
inductive bias in the sense of [74].
Many pointers to
learning by chunking,
learning by macros,
hierarchical learning,
learning by analogy,
etc. can be found in the book
by [35].
Relatively recent general attempts include
program evolvers such as
ADATE [39] and simpler heuristics
such as Genetic Programming (GP) [8,10,2].
Unlike logic-based program synthesizers [15,77,9],
program evolvers use biology-inspired concepts of
Evolutionary Computation [40,67]
or Genetic Algorithms [17]
to evolve better and better computer programs.
Most existing GP implementations, however, do not even allow for
programs with loops and recursion, thus ignoring a main motivation for
search in program space.
They either have very limited search spaces (where solution candidate runtime
is not even an issue), or are far from bias-optimal, or both.
Similarly, traditional reinforcement learners [24]
are neither general nor close to being bias-optimal.

A first step to make GP-like methods bias-optimal would be
to allocate runtime to tested programs in proportion
to the probabilities of the mutations or ``crossover operations''
that generated them. Even then there would still be
room for improvement, however, since GP has quite limited ways of making
new programs from previous ones--it does not learn better
program-making strategies.

This brings us to several previous publications on
learning to learn or metalearning [47],
where the goal is to learn better learning algorithms
through self-improvement without human intervention--compare
the human-assisted self-improver by [29].
We introduced the concept of
incremental search for improved,
probabilistically generated code that modifies
the probability distribution on the possible code continuations:
incremental self-improvers
[51,64,63,65]
use the success-story algorithm SSA
to undo those self-generated probability modifications
that in the long run do not contribute to increasing the learner's
cumulative reward per time interval.
An earlier meta-GP algorithm
[47]
was designed to learn better GP-like strategies;
[47]
also combined principles of reinforcement
learning economies [18]
with a ``self-referential'' metalearning approach.
A gradient-based metalearning technique
[50,49]
for continuous program spaces of differentiable
recurrent neural networks (RNNs) was also designed to favor
better learning algorithms;
compare the remarkable recent success of the related but
technically improved RNN-based metalearner by [16].

The algorithms above generally are not near-bias-optimal though.
The method discussed in this paper, however,
combines optimal search and incremental self-improvement,
and will be -bias-optimal, where is a small and practically
acceptable number.