The
goal of this project is to design and build a
multi-person tracking system using a network of
stereo cameras. Our system, which stands in an
ordinary conference room is able to track people,
estimate their trajectories as well as different
characteristics (e.g. size, posture).

Motivation:

Systems
which can track and understand people have a wide
variety of commercial applications. It is predicted
that computers of the future will interact more
naturally with humans than they do now. Instead of
the desktop computer paradigm with humans
communicating by typing, computers of the future
will be able to understand human speech and
movements. Our system demonstrates the capabilities
of a solely vision-based system for these ends.

Our
Approach:

We
have developed a system that can perform dense, fast
range-based tracking with modest computational
complexity. We apply ordered disparity search
techniques to prune most of the disparity search
computation during foreground detection and
disparity estimation, yielding a fast,
illumination-insensitive 3D tracking system. When
tracking people, we have found that rendering an
orthographic vertical projection of detected
foreground pixels is a useful model.

A
”plan view” image facilitates correspondence in
time since only 2D search is required. Previous
systems would segment foreground data into regions
prior to projecting into a plan-view, followed by
region-level tracking and integration, potentially
leading tosub-optimal segmentation and/or object
fragmentation. Instead, we develop a technique that
altogether avoids any early segmentation of
foreground data. We merge the plan-view images from
each view and estimate over time a set of
trajectories that best represents the integrated
foreground density. The figure on the left shows
three people are standing in a room, though not all
are visible to each camera. Foreground points are
projected onto a ground plane. Ground plane points
from all cameras are then superimposed into a single
data set before clustering the points to find person
locations.

Detecting
locations of users in a room using multiple views
and plan-view integration.

Demo
1 (9.85 MB):Shows server
module detecting locations of people in a room using
multiple views. The posture of a person is color
coded (sitting = red, standing = green). An active
camera tracks the tallest person in the room.