Robots and other computational agents are increasingly becoming part of our
daily lives. They will need to be able to learn to perform new tasks, adapt to novel
situations, and understand what is wanted by their human users, most of whom
will not have programming skills. To achieve these ends, agents must learn from
humans using methods of communication that are naturally accessible to everyone.
This thesis presents and formalizes interactive shaping, one such teaching method,
where agents learn from real-valued reward signals that are generated by a human
trainer. In interactive shaping, a human trainer observes an agent behaving in a task
environment and delivers feedback signals. These signals are mapped to numeric
values, which are used by the agent to specify correct behavior. A solution to the
problem of interactive shaping maps human reward to some objective such that
maximizing that objective generally leads to the behavior that the trainer desires.
Interactive shaping addresses the aforementioned needs of real-world agents.
This teaching method allows human users to quickly teach agents the specific be-
haviors that they desire. Further, humans can shape agents without needing pro-
gramming skills or even detailed knowledge of how to perform the task themselves.
In contrast, algorithms that learn autonomously from only a pre-programmed eval-
uative signal often learn slowly, which is unacceptable for some real-world tasks
with real-world costs. These autonomous algorithms additionally have an inflexibly
defined set of optimal behaviors, changeable only through additional programming.
Through interactive shaping, human users can (1) specify and teach desired behavior
and (2) share task knowledge when correct behavior is already indirectly specified
by an objective function. Additionally, computational agents that can be taught in-
teractively by humans provide a unique opportunity to study how humans teach in
a highly controlled setting, in which the computer agent’s behavior is parametrized.
This thesis answers the following question. How and to what extent can
agents harness the information contained in human-generated signals of reward to
learn sequential decision-making tasks? The contributions of this thesis begin with
an operational definition of the problem of interactive shaping. Next, I introduce the
tamer framework, one solution to the problem of interactive shaping, and describe
and analyze algorithmic implementations of the framework within multiple domains.
This thesis also proposes and empirically examines algorithms for learning from both
human reward and a pre-programmed reward function within an MDP, demonstrat-
ing two techniques that consistently outperform learning from either feedback signal
alone. Subsequently, the thesis shifts its focus from the agent to the trainer, describ-
ing two psychological studies in which the trainer is manipulated by either changing
their perceived role or by having the agent intentionally misbehave at specific times;
we examine the effect of these manipulations on trainer behavior and the agent’s
learned task performance. Lastly, I return to the problem of interactive shaping, for
which we examine a space of mappings from human reward to objective functions,
where mappings differ by how much the agent discounts reward it expects to receive
in the future. Through this investigation, a deep relationship is identified between
discounting, the level of positivity in human reward, and training success. Specific
constraints of human reward are identified (i.e., the “positive circuits” problem), as
are strategies for overcoming these constraints, pointing towards interactive shaping
methods that are more effective than the already successful tamer framework.