Computer vision is often treated as problem of pattern recognition, 3D reconstruction, or image processing. While these all play supporting roles, our view is that the goal of computer vision is to infer what is not in the picture. The goal is to recognize the unseen. This is different from the Aristotelian view that “vision is knowing what is where by looking.” We see vision as the process of inferring the causes and motivations behind the images that we observe; that is, we want to infer the story behind the picture.

The most interesting stories involve people. Consequently, our research focuses on understanding humans and their actions in the world. We aim to recover human behavior in detail, including human-human interactions, and human interactions with the environment.

Humans interact with each other and manipulate the world through their bodies, faces, hands and speech. If computers are to understand humans and our behavior, then they are going to have to understand much more about us than they currently do. For example, they need to recognize when we are picking up something heavy and might need help. They need to understand when we are distracted. They need to understand that changes in our behavior may signal medical or psychological changes.

To address this, we are developing the datasets, tools, models, and algorithms to recover human movement in unconstrained scenes at a level not previously possible. From single images or videos, we estimate full 3D body pose, including the motion of the face and the pose of the hands. We also recover the 3D structure of the world, its motion, and the objects in it so that human movement can be placed in context.

This is quite different from previous work in which the human body is treated in isolation, removed from the world around it, and 3D scene analysis happens on static scenes without humans. We see the interesting space as the one where people are present in, and interacting with, the 3D world. By building 3D models of people and how they move, we are able to place them in context and reason about the physics behind their behavior.

To advance this agenda, Perceiving Systems combines computer vision with machine learning and computer graphics. For example, our computer graphics models of the body enable us to generate training data for machine learning methods, which improve our computer vision algorithms. These improved algorithms give us better data with which to improve our graphics models, leading to a virtuous cycle.

This cycle is producing better and better virtual humans. We see the virtual human as more than a useful artifact. We see it as a testbed for evaluating our models of human behavior. If we can simulate a virtual human in a virtual world behaving in ways that are indistinguishable from a real human, then we assert that we have captured something about what it means to be human. This forces us to go beyond capturing human movement to modeling the causes of human movement.

We want to have an impact beyond the academic discipline of computer vision. Consequently, we develop applications in medicine and psychology in collaboration with medical colleagues. We have also spun off two companies that are using our 3D body model technology. One of these, Body Labs Inc., was acquired by Amazon in 2017. We also make code and data available open source or for license and our SMPL body model is now in wide use. Finally we are responsible for, or contribute to, widely used datasets and evaluation benchmarks that help push the state of the art and provide a platform for industry to understand what works, how well, and why.

To date we have learned the most realistic 3D models of the human body. Our approach learns the shape and pose deformation of a 3D mesh from thousands of detailed 3D scans. Over the years we have built many such models but SMPL \cite{SMPL:2015} has now become a de facto standard in the field for research on human pose.
Read More

As humans, we influence the world through our bodies. We express our emotions through our facial expressions and body posture. We manipulate and change the world with our hands. For computers to be full partners with humans, they have to see us and understand our behavior. They have to recognize our facial expressions, our gest...
Read More

Humans and animals live in, and interact with, the 3D world around them. To understand humans then, we must understand the surfaces that support them and the objects with which they interact. To that end, we develop methods to estimate the structure and motion of the world from a single image, from video, or from multiple images. We...
Read More

Much of our work focuses on capturing or estimating human movement. For this we seek metrically accurate 3D movement with increasing levels of detail. We are interested, however, in more than the movement of the joints, the facial muscles, the fingers, etc. What we really seek is what is behind human movement; tha...
Read More

Our research combines computer vision, computer graphics, and machine learning. We are driven by problems related to understanding humans and their behavior. To solve these, we often need to develop new machine learning tools. We focus on deep learning with a twist -- we combine deep learning with strong models or physical constrain...
Read More

To understand human and animal movement, we want to capture it, model it, and then simulate it. Most methods for capturing human motion are restricted to laboratory environments and/or limited volumes. Most do not take into account the complex and rich environment in which humans usually operate. Nor do they capture the kinds of eve...
Read More

Our bodies and our health are intertwined. The question we ask is how we can leverage our models of the human body to detect and treat disease? To answer this, we collaborate with doctors and psychologists to relate body shape and movement to health.
Specifically, we explore how body shape is perceived by peopl...
Read More

Datasets with ground truth have driven many of the recent advances in computer vision. They allow evaluation and comparison so the field knows what works. They also provide training data to machine learning methods that are hungry for data. Code is equally important as it supports reproducable research and enables the field to...
Read More

Our focus is on vision-based perception in multi-robot systems.
To perform a set of perception-driven tasks a team of network-connected robots with vision sensors requires two fundamental functionalities: i) autonomous navigation in the environment and ii) vision-based ...
Read More

Our goal is to understand the process of perception, to learn the representations that allow complex reasoning about visual input, inferring actions and predicting their consequences. We seek fundamental principles, algorithms and implementations for solving this task. In the past two years, we have made significant progress in this...
Read More

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems