It is prudent to train the ARL system in a constrained context to
achieve some kind of learning convergence from limited data and
limited modeling resources. Thus, the users involved in training the
system initially are given some loose instructions on the nature of
the interactions they will be performing. The users were given the
instructions listed in Table 9.1.

Table 9.1:
Interaction Instructions

Interaction

User

Corresponding Action

1

A

Scare B by moving towards camera

B

Fearfully crouch down & bring hands in

2

A

Wave hello

B

Wave back accordingly

3

A

Circle stomach & tap head

B

Clap enthusiastically

4

A

Idle or Small Gestures

B

Idle or Small Gestures

The two users (A and B) begin by playing the above game and A gestures
while B responds appropriately. The users are physically separated
from each other and only see graphical representations of each other
on their screens. The learning algorithm is given measurements of the
head and hand positions of both users. These measurements are taken
off of the players for several minutes of interaction. These sequences
generate many input-output pairs
.
The data pairs
are used to train the system which is then able to impersonate player
B. Once the training is complete, the B gesturer leaves and the single
user remaining is A. The screen display for A still shows the same
graphical character except now the actions of the character are
synthesized by the ARL system as opposed to the other player.

More specifically, the training process involved between 5 to 10 of
each of the above interactions and lasted roughly 5 minutes. This
accounts for slightly over 5000 observations of the 30 dimensional
vectors. Each of these form an
where
the
was the eigen-representation of the past short term
memory over T exponentially decayed samples. The dimensionality of
was only 22 and the short term memory was over 120 samples
(T=120 or over 6 seconds). The system used 25 Gaussians for the
pdf. The limitations on dimensionality and number of Gaussians where
mainly for speed considerations. The learning (CEM algorithm) took
approximately 2 hours to converge on an SGI OCTANE for the 5 minute
training sequence of interactions. An annealing schedule of
exponential decay with a
per iteration was used. If no
annealing is used, an inferior (and less global) solution can be
obtained in well under one hour. At the end of the annealed training
(roughly 400 iterations) the conditional log-likelihood was
approximately 25.7. Figure 9.1 shows the convergence of
the annealed CEM algorithm.