Reinforcement learning (RL) problems constitute an
important class of learning and control problems faced by
artificial intelligence systems. In these problems, one is
faced with the task of providing control signals that
maximize some measure of performance, usually taken over
time, given feedback that is not in terms of the control
signals themselves. This feedback is often called "reward" or
"punishment." However, these tasks have a direct
relationship to engineering control, as well as the more
cognitive intelligence related areas suggested by these terms
(Barto, 1990).
In recent years, many algorithms for RL have been
suggested and refined. Notable are those discussed by
Sutton, Barto, and Watkins (1989), Holland’s Bucket
Brigade (Holland et al., 1986), Watkin’s Q-learning
algorithm (Watkins and Dayan, 1992), and others. Despite
these advances, there remains no standard, analytical
methods or test suites for empirically evaluating
reinforcement learning systems. Most test problems in RL
are in abstract, difficult to analyze forms (e.g., maze
problems, the inverted pendulum, etc.).
This paper describes a method for designing arbitrary
numbers of RL test problems with clear, parameterizable
characteristics. An additional contribution of this method is
a clear identification of parameters for RL environment
complexity. Although the method described is primarily
suggested for RL problem design and RL method
evaluation, its parameters could be used for analysis of
extant RL test problems.
The following sections outline basic RL nomenclature,
followed by the development of Walsh transform based
design techniques for a limited class of RL problems.
Straight forward extensions of these techniques to a broad
class of RL problems are also provided. Use of these
techniques are further extensions are discussed.

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.