One real-time application developed by Jebara and Pentland [32]
is the automatic real-time 3D face tracking system shown in
Figure 11. An automatic initialization module finds
the face, locating eyes, nose and mouth coordinates in under a
second. These are then used to initialize 8 normalized correlation
tracking squares (i.e. sum-squared distance minimization [22]) on the face.

Figure:
Real-Time 3D Face Tracker

Each square can translate, rotate and scale and so is equivalent to
two 2D point features (Figure 12(a)(b)(c)). The
resulting 16 features are fed into the SfM algorithm resulting in the
recovery of 16 rigid 3D points. This estimated rigid 3D model is
then reprojected onto the image plane to generate a set of 16 rigidly constrained 2D points. These points are used to relocate the
individual trackers for tracking the motion in the next frame. The
trackers estimate an instantaneous trajectory yet are not permitted to
follow through with it (i.e. in a nearest-neighbor tracking
framework). Instead, this estimate is used in the SfM which computes
the corresponding rigid trajectory and repositions the trackers along
this rigid 'path' for the next frame in the sequence. Thus, instead of
letting each square individually track, the SfM couples them all,
forcing them to behave as if they were glued onto a rigid 3D body
(i.e. a 3D face). Furthermore, the 8 trackers output an error level
which can be used in the R matrix in the SfM Kalman filtering to
adaptively weight good features more than bad features in the 3D
estimates. Feature errors are mapped into a Gaussian uncertainty in
localization by an initial perturbation analysis which computes each
tracker's error sensitivity under small displacements.

Figure:
Correlation
Tracking and Feedback

The end result is a much more stable tracking framework (operating at
30Hz). If some trackers are occluded or fail, the others pull them
along via the imposed rigidity constraint. The feedback from the
adaptive Kalman filter maintains a sense of 3D structure and enforces
a global collaboration between the separate 2D trackers. Thus,
tracking remains stable for minutes instead of seconds (if no feedback
SfM is used). Figure 12(d) depicts the stability under
occlusion where a mouth and eye tracker are distracted by the presence
of the user's finger. Similarly in Figure 12(e), the
mouth tracker is distracted by deformation (smiling) where the mouth
is no longer similar to the closed mouth the template was initialized
with. These conditions remain stable due to the feedback loop.

The algorithm also re-initializes when it detects that it has lost the
face as in Figure 13. This detection is performed
via the so-called ``Distance-from-Face-Space'' calculation which
essentially computes the probability of a face pixel image with
respect to a constrained Gaussian distribution [40]. While multiple real
and synthetic tests show very strong convergence we have also used the
system extensively in the above real-time application settings where
it behaved consistently and reliably.