Special issues, workshops, and other collections

Machine Learning Journal published a special issue on robot learning,
edited by some people we know well. Take a look at the TOC
online to see what's there. MLJ also has a special issue on
reinforcement learning, but that was back in 1992 so isn't online.
(But the library has a copy.)

There was a special issue of Autonomous Robots on multi-robot
interaction. It was vol 8, no 3; here's the TOC: http://www.wkap.nl/issuetoc.htm/0929-5593+8+3+2000.
Unfortunately I don't think CMU subscribes to the online version of the
journal, and the paper version isn't in the library. But maybe there
are electronic versions of single articles online somewhere...
(In fact, there's one below under the particle filters section.)

There was a workshop on hierarchy and memory in reinforcement learning at the last ICML. The workshop page is
http://www-anw.cs.umass.edu/~ajonsson/icml/.
That page has links to notes on what was presented as well as to
individual papers. (Unfortunately the invited talks, which were quite
interesting, don't seem to show up on the site.)

Readings for a course on multi-agent systems are here.
For models of teamwork see

The Belief-Desire-Intention Model of
Agency

Teamwork

Commitments and Conventions: The
Foundation of Coordination in Multi-Agent Systems

Towards
Flexible Teamwork

Sebastian wrote a survey
article on probabilistic robotics. He also has a book on the
topic, and is giving a NIPS tutorial this year (need links).

Function approximation in MDPs

For part of my thesis I had to write a survey of methods for solving
MDPs and related models approximately. Here's a link to that
survey: http://www.cs.cmu.edu/~ggordon/related_work.ps.gz.
Here's a link with some more detail about the control theory
methods which are mentioned in the survey; it is missing a definition
of the Laplace
transform but it has a lot of worked examples.
Carlos Guestrin has done some interesting work on
on approximate value iteration using Bayes-net-based representations of
MDPs. His IJCAI
paper describes how to combine value iteration with max-norm
projection to get a convergent algorithm.

To understand Carlos' work better, it might help to read my paper on
how to include function approximation in reinforcement learning
algorithms, from ICML-95:
http://www.cs.cmu.edu/~ggordon/ml95-stable-dp.ps.Z.
This describes how to use approximators which are max-norm
nonexpansions with value iteration (note that max-norm projection is
not necessarily a max-norm nonexpansion).

Hierarchy

David Andre has done some interesting work on hierarchical
representations for learning (including desigining a variant of LISP
that includes choices for learning algorithms to make). Look
under hierarchical reinforcement learning at his publications
page, and in particular see his NIPS-01
paper.

Some of Sridhar Mahadevan's students have done work on hierarchy in
POMDPs. They've explored at least two approaches, one based on
Andrew McCallum's utile distinction memory and the other with a
Baum-Welch-style algorithm. Here are links:

The ALLIANCE architecture was designed in part as a response to (and
improved version of) the subsumption architecture, which was one of
the first ways proposed of combining simple reactive behaviors to
produce more complicated emergent behavior. A summary I found on the
web of the reasoning behind the subsumption architecture is at
http://ai.eecs.umich.edu/cogarch0/subsump/.
A common criticism of pure subsumption architectures is that their
emergent behaviors may not be sufficiently goal-directed since they
lack an explicit model of the state of the world and what the robot's
goals are. Behavior-based architectures try to address this
criticism while still avoiding central planning.

Particle filters

One of the basic tasks in POMDP-based planning is tracking a robot's
belief state. The belief state is a probability distribution over
possible states of the world; it usually includes at a minimum the
robot's own pose, and may also include a map of stationary surroundings
as well as poses of other moving objects.

There's been a bunch of work recently on using particle filters to
track a robot's belief state. (Other approaches to tracking belief
state include Kalman filters and evidence
grids.)

Dieter and friends' paper
in the AR special issue mentioned above is about tracking the poses of
multiple robots.

Mike Montemerlo's recent paper (need link) covers people-tracking, i.e.,
representing the positions of multiple moving objects within a known
map. To handle dependencies between the uncertainty in a robot's
position and the positions of the tracked people, this paper proposes a
conditional version of particle filters: every particle in the sample
for the robot's position corresponds to a whole separate filter for
each person's location.

Kevin Murphy has done some work on learning
whole maps rather than just a few object locations. (These
maps are smaller and simpler than the ones we need to use, but they
still have on the order of hundreds of unknown bits to estimate.)
To accomplish this, he uses Rao-Blackwellized
particle filters, which are a combination between particle filters
and exact inference.

Sebastian has written a paper
on how to plan in general POMDPs using a particle filter representation
of the belief state.

Dirk Ormoneit et al have written a paper
on using quasi-Monte-Carlo methods (placing particles not completely
randomly but dependent on each other in a space-filling way) to reduce
the variance of particle filters.

See also Doucet, de Freitas, and Gordon. Sequential Monte
Carlo methods in practice. 2000.

More on Kalman filters: the basic Kalman filter assumes all
distributions involved are Gaussian and all state updates are
linear. The extended Kalman filter allows for nonlinearities by
linearizing the state updates around the current mean; it can handle
moderate departures from linearity. The unscented Kalman filter
attacks the same problem as the EKF, but with a different (often
better) update rule. (Rather than linearizing the state update,
it applies the true update to a bunch of particles then uses the
results to estimate the posterior mean and covariance.) The
original paper on the UKF is by Juilier and Uhlmann (need link, but see
next paragraph).

For handling highly nonlinear or non-Gaussian problems, we need a
better state representation than a single Gaussian. An obvious
candidate is a mixture of Gaussians. There have been several
generalizations of Kalman filters to handle mixtures of Gaussians: the
most obvious is to update each one separately using the EKF or
UKF. Another is the mixture Kalman
filter. Yet another is the unscented
particle filter (this paper describes UKFs in some detail as well,
and also has a list of common tricks to make particle filters work
better). For UKFs, see also here.

Planning

Nick Roy's paper on coastal
navigation. This is a method for incorporating a rough
estimate of uncertainty into a path plan, in a way which allows us to
decide separately how bad uncertainty is at each location in state
space.

Craig Boutilier and friends wrote a (very long) paper
on the relationship between POMDP-based planning and classical
planning. Here's the citation:

EM and variational methods

Neal and Hinton's paper on EM as
free energy minimization. This was the first paper I know of to
point out the relationship between EM and free energy minimization (the
latter being an example of a variational technique) and to suggest the
idea of a "partial E-step."

David MacKay's book.
Chapter 31 is on variational methods in the draft I have, and chapter
45 applies variational techniques to the problem of model complexity
control in neural network inference. Chapter 42 is on Boltzmann
machines, which are a good example of variational methods. (All
of the other chapters are also excellent.)

Tommi Jaakkola's paper
on variational inference in the QMR-DT database. This work is
also mentioned in the following tutorial
by Jordan, Ghahramani, Jaakkola, and Saul (which Daniel Nikovski
presented at a recent ML lunch). Jaakkola also has a tutorial
of his own.