Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

Interactive control of human characters would allow the intuitive control of characters in computer/video games, the control of avatars for virtual reality, electronically mediated communication or teleconferencing, and the rapid prototyping of character animations for movies. To be useful, such a system must be capable of controlling a lifelike character interactively, precisely, and intuitively. Building an animation system for home use is particularly challenging because the system should also be low cost and not require a considerable amount of time, skill, or artistry to assemble.

This thesis explores an approach that exploits a number of different spatial-temporal constraints for interactive animation control. The control inputs from such a system will often be low dimensional, contain far less information than actual human motion. Thus they cannot be directly used for precise control of high-dimensional characters. However, natural human motion is highly constrained; the movements of the degrees of freedom of the limbs or facial expressions are not independent. Our hypothesis is that the knowledge about natural human motion embedded in a domain-specific motion capture database can be used to transform underconstrained user input into realistic human motions. The spatial-temporal coherence embedded in the motion data allows us to control high-dimensional human animations with low-dimensional user input.

We demonstrate the power and flexibility of this approach through three different applications: controlling detailed three-dimensional (3D) facial expressions using a single video camera, controlling complex 3D full-body movements using two synchronized video cameras and a very small number of retro-reflective markers, and controlling realistic facial expressions or full-body motions using a sparse set of intuitive constraints defined throughout the motion. For all three systems, we assess the quality of the results by comparisons with those created by a commercial optical motion capture system. We demonstrate that the quality of the animation created by all three systems is comparable to commercial motion capture systems but requires less expense, time, and space to capture the user? input.