The Partial Rule Approach in Context

The categorizability assumption is closely related with
complexity theory principles such as the Minimum Description Length (MDL) that has
been used by authors such as [Schmidhuber, 2002]
to bias learning algorithms. All these complexity results try to formalize
the well-known ``Occam's Razor'' principle that enforces
choosing the simplest model from a set
of otherwise equivalent models.

[Boutilier et al., 1999] presents a good review on representation methods
to reduce the computational complexity of planning algorithms
by exploiting the particular characteristics
of a given environment. The representation based on partial rules can be
seen as another of these representation systems. However, the partial rule
is just a representation formalism that, without the bias introduced by the
categorizability assumption,
would not be efficient enough to be applied to realistic applications.

The partial-rule formalism can be seen as a generalization of that of the
XCS classifier systems described by [Wilson, 1995].
This XCS learning system
aims at determining a set of classifiers (that are
combinations of features with an associated action) with their associated value
and relevance predictions.
The main difference between this approach and ours
is that Wilson's work pursues a generic learner and we bias the learning process using the
categorizability assumption.
This allows us to use an incremental rule-generation strategy that is likely to be
more efficient in robotic problems. Additionally,
the categorizability assumption also modifies the way in which
the value for a given action is evaluated:
Wilson's approach uses a weighted sum of the predictions of the classifier advocating for each
action to determine the expected effect of that action, while, to fulfill with the
categorizability assumption (i.e., to minimize the number of feature detectors and
elementary actions involved in a given evaluation), we propose to use a
winner-takes-all strategy. This is a critical point since the winner-takes-all
strategy takes full advantage of the categorizability assumption and because
it allows the partial-rule
system to asymptotically converge to a table-based reinforcement-learning
system. This is not the case when a weighted sum strategy is used.
Furthermore, in the XCS formalism there is no generalization in the action space and,
as already commented, this is a requirement in robotic-like applications.

In general, reinforcement learning does not pay attention
to the necessity of generalizing in the space of actions, although some exceptions exists.
For instance,
the work of [Maes and Brooks, 1990] includes the possible execution of elementary actions in
parallel. However this system does not include any mechanism detecting
interactions between actions and, thus, the coordination of actions relies on sensory
conditions.
For instance, this system has difficulties detecting that the execution of two actions results
always (i.e., independently of the active/inactive feature detectors) in positive/negative
reward.

The CASCADE algorithm by [Kaelbling, 1993] learns each bit of a
complex action separately. This algorithm presents a clear sequential structure where the
learning of a given action bit depends on all the previously learned ones.
In our approach there is not a
predefined order in the learning of the outputs and the result is a more flexible learning
schema.

In multiagent learning [Claus and Boutilier, 1998,Sen, 1994,Tan, 1997]
the objective is to learn an optimal behavior for a group of agents trying to
cooperatively solve a given task. Thus, in this field, as in our case, multiple actions issued
in parallel have to be considered. However, one of the main issues in multiagent
learning, the coordination between the different learners is irrelevant in
our case since we only have one learner.

Finally, the way in which we define complex actions from elementary actions has some
points in common
with the works in reinforcement learning where macro-actions are defined as the learner
confronts different tasks [Sutton et al., 1999,Drummond, 2002].
However, the useful combinations of elementary actions
detected by our algorithm
are only guaranteed to be relevant for the task at hand (although they are likely to be also
relevant for related tasks).