Caution! Robot Vehicle!
Hans P. Moravec
Robotics Institute
Carnegie Mellon University
Pittsburgh, Pennsylvania 15213
August 1990
\section{Introduction}
A special road sign bearing the legend of the title greeted
visitors to the Stanford Artificial Intelligence Laboratory during the
time it was housed in the starship (unconvincingly disguised as the
Donald C. Power building) that parked on a Stanford hill from the mid
sixties to the mid eighties. The sign, near the periphery of SAIL's
grounds, referred to the Stanford Cart, a guerrilla research project
near the periphery of John McCarthy's core interests, but motivated by
his desire for autonomous vision-guided (as opposed to co-ordinated
wire-guided) automatic cars. In a 1969 essay \cite{JMC69} {\em
Computer Controlled Cars}, John suggested that the power of a PDP-10
(in a smaller package) was adequate for the job. I think John still
favors this estimate. John is guided by a strong and principled
intuition that has proven itself correct in very many things. But in
this paper I will present accumulating experimental evidence that
hints that in this one opinion John's intuition misled him by more
than a few orders of magnitude.
\section{Cartography}
How can one expect to interpret an image sequence at the many
frames per second rate necessary for driving, on a one million
instruction per second machine? A reasonable picture of the road by
itself is an array of a million numbers. To even {\it touch} each of
these pixels takes several seconds---and doing anything substantial at
least several times longer. Driving imagery includes traffic,
obstacles, road signs and other features that appear swiftly in all
parts of the image, and often call for swift response. One answer
often voiced in the early days was that only a small fraction of the
image need be examined---with sufficient cleverness most of it could
eliminated as uninteresting {\it a priori}. In 1971 Rod Schmidt used
this approach in the first Cart thesis \cite{RAS71}. With Rod's
program (about 200 Kbytes of tight assembly code, close to the upper
size limit in those days) the Cart, moving at at a very slow walking
pace, visually tracked a white tape line on the ground. The program
contained a predictor for the future position of the line based on its
past position. It digitized about 10\% of the full image around the
predicted line position, and applied a specialized operator to once
again find the line in those bounds. This location served as the next
incremental input to both the predictor and a steering servo
calculation. Using about half the power of the PDP10, the program
could handle one image a second, and was able to follow a line for
about 50 feet at a time---if the line was unbroken, didn't curve too
much and was clean and uniformly lit. Rod noted that handling even
simple exceptions, such as brightness changes caused by shadows, would
require several times as much computation to search over alternative
interpretations, and detecting and responding to obstacles, road signs
and other hazards promised to be much more complicated.
In the early 1970s computer vision was less than ten years
old, and almost all was of the ``blocks world'' type, which reduced an
image to a list of geometric edges before doing anything else. It was
quite inappropriate for outdoor scenes containing few simple edges,
but many complicated shapes and color patterns. A major exception to
blocks world approach was a project begun at SAIL with impetus from
Joshua Lederberg and John's enthusiastic support. It was to look for
changes on the surface of Mars that might indicate life. Using
digital images from Mariners 4, 6, 7 and 9, the project worked to
register, in geometry and color, views of the same regions taken at
different times, so that any differences could be detected. Since the
spacecraft locations were known only approximately, the image
registration process was to be guided by surface features themselves.
Lynn Quam and associates developed a collection of statistical,
intensity-based comparison, search and transformation methods that did
the job \cite{PDQ71}. Since they dealt with complex natural scenes,
those methods also seemed appropriate for interpreting imagery from an
outdoor vehicle. Since NASA was then considering the possibility of a
semi-autonomous roving robot explorer for Mars (20 years later they
still are), there was a double bond between the Cart concept and the
Mars group. I arrived in late 1971, an enthusiast for space and
robots, and quickly adopted the Cart from Bruce Baumgart, its foster
father. The Cart didn't have much of a research reputation, but
discussions with Lynn produced a plan where I would provide a working
vehicle (a non-trivial project given the shoestring construction) and
PDP-10 resident driving control software. Lynn and company would
adapt their image methods for visual navigation. By 1973 I was having
a good time building and test-driving new remote control hardware and
software, when a remote control misstep crashed the poor Cart off a
small loading ramp. Months of low budget repairs left its TV
transmitter still broken, and led me to beg John to invest several
thousand dollars for a replacement. He agreed, but insisted that I
demonstrate my ability to do vision programming.
Real time was not an issue in interpreting Mars images. The
missions were several years apart, and, until the Mariner 9 orbiter,
each produced only a few dozen images. The Mars group could afford to
run search programs for hours at a time to find exceedingly precise
and dense matches over large image areas. A Cart-driving program
might forgo this precision and coverage in exchange for speed. It
seemed many tasks could be accomplished with just two basic image
operators---one to pick out a good collection of distinctive local
regions across a scene (here called {\it features}), and another to
find them in different views of the same area. Three dimensional
locations could then be determined by triangulation, obstacles
detected, and the motion of the robot deduced. I set about to find
fast implementations of these ideas. Working mostly with spatially
compressed images, and cleverly coded in assembler, my operators were
able to pick out a few dozen good features in one image and reacquire
them in another using about ten seconds of computer time. In 1975 I
built a program around them that controlled the Cart's heading by
tracking horizon features on the roads around SAIL. The program would
repeatedly digitize a frame and, in fifteen seconds, determine the
horizontal displacement of features on the (usually tree-lined)
boundary between ground and sky since the last frame, calculate a
steering correction, and drive the robot up to ten meters. It did its
unambitious task quite well, and was fun to watch. But it was
intended as a mere practice for the main event, a much more ambitious
program that could drive the Cart through an obstacle field by
visually tracking its surroundings in three dimensions---to build a
map, identify obstacles, plan safe routes and, most difficult,
visually monitor and servo the robot's motion from the apparent motion
of those surroundings. I decided to approach this task in full three
dimensions from the start, hoping eventually to run the robot on the
rolling adobe terrain outside the lab (a vain hope).
The Cart carried a single camera, suggesting the use of
forward motion of the vehicle to provide a stereoscopic baseline for
triangulating features. Its motor control was very imprecise, so the
motion would have to be solved simultaneously with the three
dimensional position of tracked objects. The Mars team had a similar
problem, and Don Gennery had already written a least squares "camera
solver" for it. I struggled with this approach through early 1977.
The program would take a picture, and choose up to a hundred features.
It would then drive the robot forward about a meter, stop, take
another picture, and search for the same features in the second image.
Then it would invoke the camera solver to find the camera displacement
and the feature positions. Despite much hacking, the program's error
rate never dropped below about one wrong motion solution in four. At
that rate the robot could move about four meters before becoming
confused about its position relative to everything
else----discouraging. The camera solver's answer was one that
minimized an error expression involving initial estimates and
calculated values of the feature positions and the camera
displacement, each weighted by position uncertainty. It also had a way
of pruning features with aberrant positions, as might be produced by
incorrect matches in the images. It worked well for high quality
spacecraft images of an almost two dimensional surface, with few
matching errors and good {\it a priori} camera position estimates. My
data, from noisy TV images of a nearby very three dimensional scene,
with plenty of perspective distortion from frame to frame, was
something else. Ten to twenty percent of the feature matches were
wrong, often because an area chosen in a first image had, in a second
image, been eclipsed, or had its appearance changed, by point-of-view
effects. Cart movements, which produced the stereo baselines, could
be controlled and estimated with only about 50\% precision. Also, the
feature position accuracy in my low resolution images was modest,
compounding the serious limitation that forward motion stereo
degenerates for points near the camera axis. The combination of many
outright bad points and large uncertainties made finding the right
answer a chancy proposition. It was necessary to track about one
hundred features to achieve even this performance, consuming several
minutes of computer time. Improving my already very good matcher by
handling its statistics more thoroughly, or alternatively widening the
camera solver search, in hope of catching the correct solution more
often, would increase the run time severalfold, one by multiplying the
inner loop time, the other by increasing the number of iterations of
the outer loop. Instead I chose to add some robot hardware to reduce
the computational uncertainties.
Multiple cameras or a repositionable camera on the robot would
permit true stereoscopy, allowing three dimensional locations of
features relative to the robot to be determined at each stop.
Mismatched features between stops might then be pruned, before solving
for robot motion, by exploiting the rigid motion constraint, i.e. that
the mutual distances between pairs of features should remain unchanged
by a move. Vic Scheinman, a steadfast friend of the Cart, found in
his basement a mechanism able to slide the camera about 60 cm from
side to side. Motorized, this provided a fine stereo baseline.
Errors were further reduced by exploiting the redundancy of nine
pictures taken across this track. The final result, first
sufficiently debugged in October of 1979, was a program that would
track about thirty small image features at a time to visually servo
the robot through indoor clutter, mapping it and avoiding obstacles,
using ten minutes of computing per meter of travel. In five hours it
would arrive at a requested destination at the opposite end of a
thirty meter room, succeeding in about three traverses in four.
Outdoors the harsh contrast between sunlight and shadow overwhelmed
the vidicon camera and degraded the success rate. \cite{HPM80}.
\section{Fast Cars}
In 1977 Japan's Mechanical Engineering Laboratory demonstrated
a stereo-vision guided autonomous automobile that could follow well
defined roads for distances of about 50 meters at speeds up to 30
km/h, using highly specialized hardware occupying a rack on the
passenger side of a small car \cite{MEL89}. Two television cameras
were mounted, one above the other on the car's front grill, and
oriented so that their fast scan direction was vertical. Their video
signals were electronically differentiated, to detect brightness
changes, then quantized into binary bit streams. The streams from the
two cameras were matched at various offsets by a tapped shift register
and a bank of binary logic comparators. When properly adjusted for
local conditions, this circuit, doing the equivalent of about 50 MIPS
of computing, provided an indication of the distance of about eight
major visual discontinuities, such as road embankments and obstacles,
thirty times per second. The range indications were sampled about ten
times per second by a $1/4$ MIPS minicomputer programmed to keep the
vehicle on road and veer around obstacles.
In 1984, as part of its Strategic Computing Initiative, DARPA initiated
an overambitious program called ``Autonomous Land Vehicles'' (ALV) that
promised stealthy robot crawlers to do reconnaissance, sabotage and
perhaps combat on a battlefield. A decade of computer vision work had
convinced the managers that stereo vision was too hard a problem for their
time frame, but they guessed that the rest of the perception and
navigation problem was tractable. Before being abandoned five years
later, the project had financed a half dozen small experimental
vision-guided vehicles and two large ones. A large rough-terrain vehicle
at Martin-Marietta in Denver \cite{ALV89} was equipped with about 50 MIPS
of computer power, color television cameras and a scanning laser
rangefinder that provided, twice a second, a 128 by 256 array of distance
measurements across the field of view, doing by physics what was
impractical by computation. A similar machine, based on a large Chevy
van, was constructed at Carnegie Mellon University \cite{CET90}. By the
end of the project the ``ALV'' and the ``NavLab'' were both driving at
down dirt roads at speeds up to 50 km/h, but usually much slower, tracking
road boundaries with color based image operators, and stopping for
obstacles detected by a minimal processing of the laser range data. Both
were able to do this as long as the road boundaries were relatively well
defined and without major discontinuities. But the simple road
identification operators achievable with 50 MIPS were often fooled, and
both vehicles were unlikely to stay on a road for a whole kilometer. The
ALV program had specified off-road navigation as a subsequent goal, but
the first phase results left few avenues for that.
A project in Germany begun in 1984 \cite{DED90} produced a van
that sometimes drives autonomously on the Autobahn at up to 100 km/h
guided by the output of a single monochrome camera. The camera image
goes to an array of up to a dozen specialized image processors, each
with an effective computing rate of about 10 MIPS that permits simple
operations, such as convolving an image patch, to proceed at a full 60
frames per second. Each processor is programmed to keep a single
small image window on a feature in the scene (just like self-steering
TV guided bombs and missiles). The features are pointed out to the
system manually at the beginning of an autonomous run. Typically one
is the left edge of the highway or the lane, another the right edge.
These are tracked very much in the way Rod Schmidt's program tracked
its white line. Other features are chosen on license plates or other
distinctive marks on traffic ahead and to the sides of the van. Using
motion prediction techniques, the image processors are able to
maintain their visual locks for many minutes. Their output goes to
another processor that servos the vehicle to stay in the lane, and to
keep a safe distance from the other traffic.
In none of the above systems is it advisable for the human
supervisor to stray far from the manual override button---these
simple-minded machines are very easily confused by such common driving
events as shadows, road stains, lampposts, stopped cars or sudden
curves. Recent work by several of the research groups has begun to
rely on a combination of dead reckoning and satellite navigation for
primary guidance, with sensing being demoted to the single task of
slowing or stopping the vehicle when an obstacle blocks the way. With
this simplified approach, reliable high speed ``playbacks'' of human
driven routes have been demonstrated, adequate, perhaps, for
controlling the repetitive trips of ore trucks in strip mines.
\section{Night Crawlers}
The availability of cheap microprocessor ``brains'' since 1980
encouraged dozens of individuals and groups worldwide to build or
acquire small mobile robots. In hobby efforts, small companies,
industrial and government labs, high schools, university undergraduate
and graduate projects, programmable vehicles were built and
operated. Low budget machines relied on contact, sonar or infrared
proximity detectors and fractional MIPS processors; the more expensive
versions carried TV cameras or laser rangefinders of various kinds,
and often several 1 MIPS processors. A few had onboard manipulators. A
half dozen companies offered hobby robots costing a few thousand
dollars and controlled by eight bit processors. The majority were
abandoned within a few years, having achieved, at best, feats similar
to the 1950s toylike pre-computer light seeking turtles of W. Grey
Walter and Norbert Wiener [WGW61,NW65] or the wall-socket feeding
Hopkins Beast of 1965. Several more advanced machines with sonar or
optical range sensors built two dimensional maps of their surroundings
using a blocks-world-like edge-based representation. These could work
in real time with clean measurements from robots moving slowly in
simple office environments, but were overwhelmed in cluttered spaces,
or where artifacts such as specular deflection of the ranging beam
produced a high rate of range errors. A small number of projects with
camera equipped robots applied stereo, range stripe and shading-based
scene interpretation methods from computer vision research---methods
that consume minutes or hours of computer time. Most have returned to
mainstream vision research, having decided that a robot is an
expensive and inconvenient way to obtain a few pictures.
A few startup companies in the 1980s attempted to deliver
autonomous mobile robots for industrial (as opposed to toy or
experimental) markets. Still struggling are Denning Mobile Robotics
of Wilmington, Massachusetts, Cybermotion of Roanoke, Virginia and
Transitions Research of Danbury, Connecticut. All produce battery
powered machines roughly human in scale weighing a few hundred pounds,
controlled by about one MIPS of computer power, costing about
\$50,000. They navigate and detect obstacles with various optical and
acoustic sensors, but not computer vision. The companies have
addressed applications in building security, factory parts delivery,
TV studio camera transport, floor cleaning, warehouse inventory,
hospital and office mail delivery---anywhere where navigating around
is half the battle. These markets have proven difficult thus far, and
most sales have been to mobile robot research groups. Most familiar to
me is Denning, founded in 1982 with the idea of making robot security
guards (or roving burglar alarms) to patrol and detect intruders in
large warehouses or office suites. In 1983 the company decided on a
shape like an oil drum, with three driven wheels and a steering
arrangement that ganged the wheels and a sensing ``head''. Obstacle
detection was by a belt of 24 inexpensive Polaroid sonar rangefinders.
An area photodiode, which indicates the position of a spot of light,
provided navigation keyed to infrared beacons mounted at the ends of
hallways. Control was by a 1 MIPS Motorola 68000 microprocessor and a
few smaller processors. In several years of evolutionary development
that addressed increasingly subtle failure modes, the company
demonstrated machines that patrolled aisles and hallways for minutes,
hours, days and eventually months at a time without human
intervention. A facility-specific guidance program drives the robot
down beacon equipped hallways and integrates wheel rotations to
navigate beaconless corners. The sonar readings are averaged in a
simple way to squeeze through doorways and avoid obstacles
\cite{MK86}. Each morning the program guides the robot to ``sleep'' in
a recharging bay. In 1989, for AF Associates, a maker of TV studio
equipment, Denning added a much more precise navigation instrument
that relies on retroreflective tapes in the environment sensed by a
horizontally spinning laser/detector on the robot. Triangulation from
the angular position of three such tapes gives the robot's position to
better than one centimeter. Several operatorless TV cameras in
national news studios now move on bases that are fat Denning robots.
The same guidance methods have been combined in successful
demonstrations of vacuuming and wet scrubbing robots, in work with
Windsor Industries, a maker of industrial cleaning machines.
To date, robot navigators that attempt to model the world
comprehensively are caught in a dilemma---either they reduce noisy
sensor data too selectively and too uncritically, and so are easily
confused and ``brittle'', or they consume hours in statistical
deliberations, and so are unusably slow. The most successful
machines, such as the road vehicles of the previous section, and the
commercial robots of the last paragraph, make do with minimal or no
models, keeping a short path between sensors and effectors. Sensor
glitches in these simple-minded robots cause momentary behavioral
transients, but are quickly compensated by subsequent inputs. Rodney
Brooks' group at MIT has taken this ``reflexive'' approach to new
highs of complexity \cite{ROD89}. Rod's small robots carry simple
sensors and about 1 MIPS of microprocessor power, programmed with
multiple interacting layers of reflexive programming by means of a
special workstation-resident compiler. The result is complex and
moderately competent behavior, resembling that of insects. The devices
are particularly engaging because most of the navigational problem
solving necessarily involves physical probings an scurryings rather
than internal data manipulation.
My own lab's research has taken an intermediate course. In the
early 1980s my students streamlined the Cart obstacle program (which,
with its three dimensional point maps still had a distinct blocks
world flavor) by exploiting constraints (like the knowledge that the
vehicle moves only two dimensionally) to produce a program that ran
about ten times as fast \cite{CET84}. Its navigational accuracy was
increased by the same factor by more elaborate modeling of geometric
uncertainties \cite{LHM89}. But its 3 in 4 room crossing success rate
was hardly changed. \cite{LHM89} describes further work in multiple
view stereo vision that could, in principle, allow a denser and more
robust world map---but at the usual price of hours-long processing per
image set. In 1984 we changed our approach when we accepted a
contract from Denning to do map-based navigation using their
obstacle-detecting sonar ring. Each sonar transducer emits a $30\deg$
beam and reports the first echo it hears, leaving a great uncertainty
about the lateral position of the detected object. We dealt with this
by modeling the robot's knowledge of its surroundings as a spatial
occupancy probability function (a {\it map}), represented as a
discrete grid. A sonar reading was itself such a distribution (in
this case called a {\it sensor model}), which was projected onto the
appropriate part of the map, raising and lowering values there. We
were surprised in 1985 when an {\it ad hoc} implementation gave us a
robot program able to build maps of its surroundings and cross
cluttered rooms with almost perfect reliability \cite{HPM88,AE89}.
Rasterizing space with a probability values is potentially expensive,
but with a coarse two-dimensional grid and optimized coding, our
program processed 10 sonar readings per second on the 1 MIPS Denning
robot. A key navigational step, matching two maps of the same area,
took 3 seconds. In a recent developments, we have used the approach
with stereo vision data, and devised a learning procedure for tuning
sensor models. The best versions of our program can build two
dimensional maps in real time on 10 MIPS processors. The approach
extends naturally to three dimensional maps, which can be considered
stacks of about 100 two dimensional maps, but the computational cost
rises to 1,000 MIPS.
\section{Reflection}
Twenty five years of vision and robotics experience has given rather
consistent results: 1 MIPS can extract only the most trivial real time
measurements from live imagery---tracking a white line or a white spot on a
mottled background is near its upper bound. 10 MIPS can track a complex
gray-scale patch---smart bombs, cruise missles and German vans attest.
100 MIPS can follow a moderately complex and changeable feature like a
road boundary---as the DARPA ALV effort demonstrated. 1,000 MIPS might be
adequate to give coarse-grained three dimensional spatial
awareness---suggested by several low resolution stereo vision programs and
my occupancy grid experiences. 10,000 MIPS should be able to locate three
dimensional objects in clutter---suggested by several ``bin-picking'' and
fine grain stereo vision demonstrations, which were able to accomplish the
task in an hour at 1 MIPS. There is not much robotics data beyond this
point---with available computing power, research careers are too short to
endure the necessary experiments.
Are these numbers out of line? The more sophisticated vision
programs interpret scenes by searching a vector space of alternative
interpretations, evaluating candidates by computing statistical
expressions on millions of pixels. These are truly enormous
computations---combinatorial searches on the outside, megascale
numerical evaluations on the inside, and call for extreme cleverness
(and many cut corners). Naive implementations are often a hundred
times slower that the estimates in the last paragraph. For a sense of
perspective, consider the easier (because no need for search over
alternatives) inverse problem of computer graphics. For smooth
real-time animated video, 1 MIPS suffices to make simple line
drawings---spacewar at SAIL. 10 MIPS can animate complex colored two
dimensional shapes or simple three dimensional models---CAD programs
in current workstations. 100 MIPS is enough for cartoonlike
presentations of three dimensional scenes with a few thousand
facets---specialized graphical workstations like Silicon Graphics
Iris. 1,000 MIPS can generate scenery with tens of thousands of
facets just adequate for daytime aircraft flight simulators---Evans
and Sutherland CT5. Realistic imagery will require several orders of
magnitude more, to judge from ``almost real'' single frames shown by
Pixar and others, which take many hours at 1,000 MIPS.
What about animal vision systems? The first stages of human
vision occur in the retina, the best understood major part of the
vertebrate nervous system, whose product is neatly packaged in the
optic nerve. The optic nerve is a bundle of a million fibers, each
carrying the results of an edge or motion computation that, in an
efficient computer implementation, would require execution of at least
100 instructions. With an effective frame rate of 10 per second, the
net computing equivalent of the whole retina is 1,000 MIPS
\cite{HPM88mc}. The rest of the visual system is thousands of times
larger, so human visual processing power may exceed the equivalent of
1,000,000 MIPS, a number fully consistent with the experiences of
robotics vision research.
Frankly, it would take a miracle to bridge the gap between the
research experience and John's 1 MIPS guess for the difficulty of
robot car vision. Mathematicians have speculated about the
mathematical oversights that led Fermat to make his famous, almost
certainly mistaken, marginal note. I will rudely do the same with
John's estimate. John's had in mind the automation of reasoning when
he coined ``Artificial Intelligence''. When electronic computers were
new their most notable property was a prodigous ability to do
arithmetic---superhuman by a factor of thousands. AI was the attempt
to harness this powerful engine to other mental tasks. Reasoning
programs cut computer power down to human size, but no further, and
set the calibration on John's intuition. You can always tell the
pioneers---they're the ones face down in the trail with the arrows in
the back. To change the metaphor, let me describe the view from this
giant's shoulders. Rational thought---reasoning---is a recent
evolutionary invention, with little time or competition to perfect it.
We don't do it very well, and it doesn't take much of a machine to
match us. Arithmetic is even worse. But seeing and moving around,
finding food and escaping enemies, has been a life or death issue
since before we had a backbone, and the competition has been
fierce. We are enormously optimized in those functions, and it will
take a powerful machine to match us there. In \cite{HPM88mc} I've
made my own calibration, which is that a whole human mind is the
equivalent of 10,000,000 MIPS. Now, when doing arithmetic, I achieve
about $10^{-8}$ MIPS effective. Playing chess, maybe 0.001 MIPS, as
calibrated by good chess programs. Walking home, dodging traffic,
maybe 1,000,000 MIPS. Grandmasters seem to find ways to harness the
more powerful parts of their nervous systems for their tasks---by
visualizing, feeling, hearing, perhaps mnemonizing, with different
tasks giving different amounts of maximum purchase. The best
lightning calculators still do only about $10^{-7}$ MIPS of
arithmetic, but grandmaster chess machines suggest that best human
performance there is worth about 10,000 MIPS.
Do these estimates relegate John's self-driving cars into the
indefinite future? I think not. Electronic computers have become
1,000 times more powerful every twenty years since they appeared a
half century ago. Affordable 100 MIPS machines are almost here, and
1,000 MIPS will be available by the end of the decade. Specialized
circuitry can provide a hundredfold speedup of systematic computations
like those in vision and spatial modeling, allowing an effective power
of 100,000 MIPS by then---a match for the visual system of a monkey.
With a little help from external sources like satellite navigation and
digitized maps, perhaps an auto-auto can find its own way to
celebration $110_8$.
\begin{thebibliography}{}
\bibitem{JMC69} John McCarthy. \newblock Computer Controlled Cars.
\newblock In SAIL computer directory [ESS,JMC] files {\em CAR.ESS}
(1968) and {\em CAR.TEX} (1975).
\bibitem{PDQ71} Lynn H. Quam. \newblock {\em Computer Comparison of
Pictures}. \newblock Stanford AI Memo AIM-144, Stanford Computer
Science Department, July, 1971.
\bibitem{RAS71} Rodney A. Schmidt. \newblock {\em A Study of the
Real-Time Control of a Computer Driven Vehicle}. \newblock Stanford
AI Memo AIM-149, Stanford Computer Science Department, August, 1971.
\bibitem{HPM80} Hans P. Moravec. \newblock {\em Obstacle Avoidance
and Navigation in the Real World by a Seeing Robot Rover}. \newblock
Stanford AI Memo AIM-340, Stanford Computer Science Department, May,
1980.
\bibitem{MEL89} T. Yatabe, T. Hirose and S. Tsugawa. \newblock
Driving Control Method for Automated Vehicle with Machine Vision.
\newblock {\em Journal of Mechanical Engineering Laboratory}; vol.43,
no.6; Nov. 1989; pp. 267-75.
\bibitem{ALV89} S. J. Hennessy and R. H. King. \newblock Future
mining technology spinoffs from the ALV program. \newblock {\em IEEE
Transactions on Industry Applications}; vol.25, no.2; March-April
1989; pp. 377-84.
\bibitem{CET90} Charles E. Thorpe, editor. \newblock {\em Vision and
navigation : the Carnegie Mellon Navlab}. \newblock The Kluwer
international series in engineering and computer science. \newblock
Boston : Kluwer Academic Publishers, c1990. xiv.
\bibitem{DED90} E. D. Dickmanns, B. Mysliwetz and T. Christians.
\newblock An integrated spatio-temporal approach to automatic visual
guidance of autonomous vehicles. \newblock {\em IEEE Transactions on
Systems, Man and Cybernetics}; vol.20, no.6; Nov.-Dec. 1990;
pp. 1273-84.
\bibitem{WGW61} W. Grey Walter. \newblock {\em The Living Brain}.
\newblock Middlesex : Penguin Books, 1961.
\bibitem{NW65} Norbert Wiener. \newblock {\em Cybernetics, or Control
and Communication in the Animal and the Machine}. \newblock Cambridge
: MIT Press, 1965.
\bibitem{MK86} Mark Kadonoff, F. Benayad-Cherif, A. Franklin,
J. Maddox, L. Muller, B. Sert and H. Moravec. \newblock Arbitration
of Multiple Control Strategies for Mobile Robots. \newblock {\em SPIE
conference on Advances in Intelligent Robotics Systems}, Cambridge,
Massachusetts, October 26-31, 1986. \newblock In SPIE Proceedings Vol
727, paper 727-10.
\bibitem{ROD89} Rodney A. Brooks. \newblock A robot that walks;
emergent behaviors from a carefully evolved network. \newblock
{Neural Computation}; vol.1, no.2; Summer 1989; pp. 253-62.
\bibitem{CET84} Charles E. Thorpe. \newblock {\em FIDO: Vision and
Navigation for a Mobile Robot}. \newblock Carnegie Mellon Computer
Science report CMU-CS-84-168, Autumn 1984.
\bibitem{LHM89} Larry H. Matthies \newblock {\em Dynamic Stereo
Vision}. \newblock Carnegie Mellon University Computer Science report
CMU-CS-89-195, October 1989.
\bibitem{HPM88} Hans P. Moravec \newblock Certainty Grids for Sensor
Fusion in Mobile Robots. \newblock {\em AI Magazine}, Summer 1988, pp
61-77.
\bibitem{AE89} Alberto Elfes. \newblock Using Occupancy Grids for
Mobile Robot Perception and Navigation. \newblock {\em IEEE Computer
Magazine}, special issue on Autonomous Intelligent Machines, June
1989, pp. 46-58.
\bibitem{HPM88mc} Hans Moravec. \newblock {\em Mind Children: the
future of robot and human intelligence}. \newblock Cambridge :
Harvard University Press, 1988.
\end{thebibliography}
\end{document}