We learn an articulated, 3D, statistical shape model of animals (SMAL) that can represent quadrupeds of different species using very little training data (top). We fit SMAL to a set of uncalibrated images estimating pose, shape and vertex displacements and recover 3D textured meshes for a wide range of species (bottom).

In the past 15 years impressive advances have been made in capturing, modeling and tracking the human body. Animals have received much less attention, despite many applications in biomechanics, biology, neuroscience, robotics, and entertainment. The main reason for the lack of 3D animalsd models is that the experience in modeling the human body cannot be easily applied to animals: animals are not collaborative, and it is not possible to bring thousands of them in a lab for 3D scanning. In this project we develop methods to learn 3D articulated statistical shape models that can represent a wide variety of species in the animal kingdom, allowing intra- and inter-species analysis of 3D shape and the automatic and non-invasive assessment of animal shape from images.

Challenges for building the model comprise defining shape correspondences and integrating data captured with different modalities, including images and video. We obtain the SMAL (Skinned Multi Animal Linear) model, a 3D articulated statistical shape model able to represent animal shapes for different species: big cats, dogs, cows, horses, zebras, and hippos. When the animal is not present in the SMAL shape space, we can still capture its 3D shape and pose with the SMALR method (SMAL with Refinement). With SMALR we can accurately capture a 3D textured mesh using a small set of uncalibrated, non-simultaneous images of the animal. The model can be fitted to images to estimate animal shape and pose.

Today animal motion is mostly captured indoors for domestic species with marker-based systems. To address this we are exploiting the 3D articulated shape model to develop a markerless motion capture system that does not require an a-priori 3D model of the subject, allowing capturing articulated motion of wild animals in their natural environment.

Animals are widespread in nature and the analysis of their shape and motion is important in many fields and industries. Modeling 3D animal shape, however, is difficult because the 3D scanning methods used to capture human shape are not applicable to wild animals or natural settings. Consequently, we propose a method to capture the detailed 3D shape of animals from images alone. The articulated and deformable nature of animals makes this problem extremely challenging, particularly in unconstrained environments with moving and uncalibrated cameras. To make this possible, we use a strong prior model of articulated animal shape that we fit to the image data. We then deform the animal shape in a canonical reference pose such that it matches image evidence when articulated and projected into multiple images. Our method extracts significantly more 3D shape detail than previous methods and is able to model new species, including the shape of an extinct animal, using only a few video frames. Additionally, the projected 3D shapes are accurate enough to facilitate the extraction of a realistic texture map from multiple frames.

There has been significant work on learning realistic, articulated, 3D models of the human body. In contrast, there are few such models of animals, despite many applications. The main challenge is that animals are much less cooperative than humans. The best human body models are learned from thousands of 3D scans of people in specific poses, which is infeasible with live animals. Consequently,
we learn our model from a small set of 3D scans of toy figurines in arbitrary poses. We employ a novel part-based shape model to compute an initial registration to the scans. We then normalize their pose, learn a statistical shape model, and refine the registrations and the model together. In this way, we accurately align animal scans from different quadruped families with very different shapes and poses. With the registration to a common template we learn a shape space representing animals including lions, cats, dogs, horses, cows and hippos. Animal shapes can be sampled from the model, posed, animated, and fit to data. We demonstrate generalization by fitting it to images of real animals including species not seen in training.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems