research

The Qualitative Learner of Actions and Perception (QLAP)

This video about QLAP was accepted to the 2010
AAAI video competition. It won the award for Best Educational Video!

An agent, human or otherwise, receives a large sensory stream
from the continuous world that must be broken up into useful features. The agent
must also learn to use its low-level effectors to bring about desired changes in the
world.
Humans and other animals have adapted to their environment through a combination of evolution
and individual learning. We blur the distinction between individual and species learning
and define the problem abstractly as how can an agent from low-level sensors and effectors
learn high-level states and actions through autonomous experience with the environment.

Pierce and Kuipers [1997]
have shown that an agent can learn the structure of its sensory and motor apparatus.
Building on this work, Modayil and Kuipers [2004] have shown
how an agent can individuate and track objects in its sensory stream.
Our approach builds on this work to enable an agent to learn a discrete sensory description
and a hierarchical set of actions.
We call our approach the Qualitative
Learner of Action and Perception, QLAP.

QLAP learns a discretization of the environment and predictive models of the dynamics of the
environment as shown in Figure 1.
QLAP assumes that the sensory stream (Fig. 1-a) is converted
(Fig. 1-b) to a set of continuous variables. These variables give
the locations of objects and distances between them.
To build models of the environment, QLAP must learn the necessary discretization.
QLAP begins with a very simple discretization (Fig. 1-c), that
essentially can only give the direction of movement of objects and
if a distance between objects is increasing or decreasing.
From this low-resolution representation, QLAP learns (Fig. 1-d) a set of
primitive models
to describe the dynamics of the environment. These initial models are simple and unreliable,
but they make predictions about changes in the environment
that can be used to generate a supervised learning signal.
This learning signal points QLAP towards new discretizations that
can make the models more reliable (Fig. 1-e). Through this synergy of
discretization and model building, QLAP builds an increasingly sophisticated representation of the
environment.

Figure 1: Perception in QLAP

The process of learning actions is shown in Figure 2.
Since the models predict the dynamics of the environment, they can be
converted to plans to bring about the predicted effects (Fig. 2-a).
Each plan then serves as a different way to perform an action (Fig. 2-b).
These plans are then put together to form a hierarchy of actions (Fig. 2-c).

Figure 2: Actions in QLAP

Throughout this process, the agent is learning autonomously. The agent initially motor babbles making
random movements. After learning its first models and actions, it can choose
actions to ``practice.''
It chooses these actions using Intelligent Adaptive Curiosity, which causes
the agent to choose actions that it is getting better at performing. Once the agent has
mastered an action, it moves onto another action. Since the hierarchy of actions is
continually expanding, actions that were initially called for
practice are later called as subactions of higher actions. During this process, the agent
learns many models. Those models that do not make sufficiently deterministic predictions
are discarded, and those models that do make sufficiently deterministic predictions are
converted to plans. Plans
that lead to successful completion of the action are more likely to be used by the agent. In this way,
the agent adapts to the environment in a developmental progression.

QLAP contributes to the fields of reinforcement learning (RL) and developmental
robotics. In RL, a constant challenge is how best to accommodate continuous states and actions.
QLAP provides a method for discretizing the state space so that the discretization corresponds
to the ``natural joints'' in the environment. Hierarchy construction is an active area of
research in RL, and QLAP creates a hierarchical set of actions from continuous motor
variables.
Additionally, QLAP autonomously creates reinforcement learning problems as part of its developmental
progression. This developmental progression is the main contribution of QLAP to developmental
robotics. Additionally, this developmental progression provides a test bed for exploring ideas
in developmental robotics. For example, we have shown that giving extra emphasis to rare events
aids in the autonomous learning of the action to pick up a block using a
magnetic hand.

&nbsp &nbsp &nbsp &nbsp &nbsp &nbsp &nbsp &nbsp &nbsp
We work in simulation to isolate
the problem of development.
Here is a picture of the
simulated robot.

Initially QLAP motor babbles in its environment as shown
here (4 MB). The two floating objects can be sensed
by the robot and are added to make the environment more realistic.

As QLAP explores, its learning becomes more directed as shown
here (4 MB) and here (4 MB).
These videos show the agent exploring its environment after 3.33 hours
of experience. It has autonomously learned actions to manipulate the block,
and it interacts with the block because this is what it finds interesting at
this point in its development.
In the first video the block is replaced when it is "picked up." In the second
video, the block causes friction with the table, making it hard
to move the hand.

After learning, QLAP can be given a task such as "grabbing" the block.
In this video (4 MB)
we see the agent attempting to perform that task after learning. The block
is replaced when it is grabbed.