Technical Report

Objective

We are engaged in a 100% effort to develop laboratory prototype
sensor-based software for utility mobile robots for industrial
transport, floor maintenance, security etc., that matches the
months-between-error reliability of existing industrial robots without
requiring their expensive worksite preparation or site-specific
programming. Our machines will navigate employing a dense 3D awareness
of their surroundings, be tolerant of route surprises, and be easily
placed by ordinary workers in entirely new routes or work areas. The
long-elusive combination of easy installation and reliability should
greatly expand cost-effective niches for mobile robots, and make
possible a growing market that can itself sustain further development.

Approach

Our system is being built around 3D grids of spatial occupancy
evidence, a technique we have been developing for two decades,
following a previous decade of robot navigation work using a different
approach (See Figure 1: Project
history). 2D versions of the approach found favor in many
successful research mobile robots, but seem short of commercial
reliability. 3D, with 1,000 times as much world data, was
computationally infeasible until 1992, when when we combined increased
computer power with 100x speedup from representational, organizational
and coding innovations. In 1996 we wrote a preliminary stereoscopic
front end for our fast 3D code, and the gratifying results convinced
us of the practicability of the approach, given about 1,000 MIPS of
computer power. We work to parlay that start into a universally
convincing demonstration, just as the requisite computing power
arrives.

The work has three stages: completion and improvement of the basic
perception code; creation of an identification layer for navigation
and recognition of architectural features; finally, sample application
programs that orchestrate the other abilities into practical behaviors
like patrol, delivery and cleaning. We need both capability and
efficiency. The existing code allows one-second time resolution with
1,000 MIPS, but our 3D maps have millions of cells, and
straightforward implementations of path planning, localization, object
matching, etc. would be much slower. We will combine techniques like
coarse-to-fine matching, selective sampling and alternate
representations to get the desired results at sufficient speed.

Accomplishments for March 2001

We have continued to developing our "learning through coloring"
program, whose purpose is to optimize sensor models and other
parameters that affect the quality of the gridmaps derived from sense
data. As we did so the quality of the maps has palpably improved, as
has the objective color variance score. We increased the number and
scope of sensor model parameters adjusted by the learning, added an
iterative image color adjustment process that compensates for
differences in exposure in the robot images of the scene by using the
colored grid (which averages the views) as a color reference.

In our last report, we noted that the program had found its best
scores by setting the occupancy threshold that decides whether a cell
is imaged as opaque or transparent to a negative value, effectively
declaring blank maps as being fully occupied. Subsequent sensing then
carved out empty regions rather than by building up the occupied
parts. The resulting grids contain features, such as horizontal
shelves, that were not actually detected by our legacy two-eyed
stereo, but appear through a process of elimination. The number
of occupied cells remained in the millions, however. The program
pruned them down to about 100,000 by eliminating those that were
seen from no sensing position, and thus received no color.

As we added parameters to the learning that allowed the evidence rays
to be reshaped, we were delighted to find the optimum solutions now
had zero rather than negative occupancy thresholds. Alas, the number
of occupied cells was still almost a half million before color
pruning. The program had substituted enormous extensions of the
occupied portion of evidence rays. Each point that was sensed filled
a meter or two of space behind it. That fill was then carved out by
other evidence.

We trace much of the odd nature of the optimum solutions to
deficiencies in the sensor configuration of our stock data, which
includes an office scene and a more elaborate laboratory scene. In
particular, horizontal two-camera stereo prevents the ranging of clean
horizontal features such as shelves and lintels. Blank surfaces like
walls also cannot be ranged. Also, though the learning does well in
making the grid map match the views of the scene, it currently makes a
mess of areas out of view: behind every object one finds a meter of
occupied roughness.

One solution to the latter problem is to have more views from more
directions. Our laboratory data set does this to some extent: it has
front, side, back and diagonal view directions (the office data set
are all frontal views), and indeed some portions of the lab data
are much cleaner than the office scene. But we intend to do much
better.

We've built a sensor head with three cameras and a textured light
source. These are intended to allow the program to see all edge
directions, and also blank surfaces. We will arrange for the program
to take both textured and untextured image triplets from each
position: the textured images are used for stereoscopic matching, the
regularly lighted ones for coloring. At each position we will take
images in four compass directions (our cameras have 90 degree fields
of view, so this gives us almost complete horizontal coverage), giving
cross views of much of the scene. In addition we have arranged a test
area with mounts for overhead views. The overhead views of the test
area will not be used in the map-building stage of the program (since
a wandering robot would not have such available), but will be used to
contribute color (and color variance) to the scene. Effectively, the
program will be penalized for making mistakes in parts of the map
hidden from the ground-level viewpoints. We expect this to teach the
program to avoid depositing debris in the unseen spaces. For
instance, though our present best maps are quite good, they have a
dense "roof" of debris at the upper edge of the camera fields of view,
which the overhead views will heavily penalize.

Current Plan

We have completed building our new sensor setup, and will use it to
collect new data almost immediately. Although it can be used on our
Cye robot, we have opted to take our first data from a manually moved
camera stand built around a very precise optical bench mount. We
position this stand by lining up calibrated notches in its base with
an array of precisely marked floor dots. This allows the camera
positions to be known to a millimeter or two, allowing us to separate
the robot fine-positioning problem from the map-building problem. And
when we do begin to solve for camera positions, we will have a
reference to judge how well the program is doing,

We expect our new data to bring us further towards photorealism, and
more importantly extremely reliable 3D maps. We have begun a parallel
effort to build the second, recognition, software layer in our
original proposal. This extracts paths, localization, basic
architectural features and some object identifications from the
maps.

Technology Transition

By the end of the project we plan to produce a prototype for
self-navigating robots that can be used in large numbers for transport,
floor cleaning, patrol and other applications.