It is absolutely awesome -- amazing. I'm one of the toughest critics of
face capture, and even I have to admit, these guys have nailed it. This is the first virtual
human animated sequence that completely bypasses all my subconscious warnings. I get the
feeling of Emily as a person. All the subtlety is there. This is no hype job, it's the real
thing ... I officially pronounce that Image Metrics has finally built a bridge across the
Uncanny Valley and brought us to the other side.

With cast and crew from Benjamin Button and Avatar, Digital Emily was part of the Academy of Motion Picture Arts and Sciences'"Acting in the Digital Age" event, 4/22/2010

Introduction:

Over the last few years our lab has been developing a new high-resolution realistic face scanning process using our light stage systems, which we first
published at the 2007 Eurographics Symposium on Rendering. In early 2008 we were approached by
Image Metrics about collaborating with them to create a realistic animated digital actor as a demo for their
booth at the approaching SIGGRAPH 2008 conference. Since we'd gotten pretty good at scanning actors in different facial poses and Image Metrics has some really neat
facial animation technology, this seemed like a promising project to work on.

Image Metrics chose actress Emily O'Brien to be the star of the project. She plays Ms. Jana Hawkes on "The Young and the Restless" and was nominated
for a 2008 daytime Emmy award. Emily came by our institute to get scanned in our Light Stage 5 device on the afternoon of March 24, 2008. The image to the left shows Emily in the light stage during a scan, with all 156 of its white LED lights turned on.

Our previous light stage processes used to capture digital actors for films such as Spider Man 2, King Kong, Superman Returns, Spider Man 3, and Hancock captured hundreds of images of the actor's face from every lighting direction one at a time. This allowed for very accurate facial reflectance to be recorded and simulated, though it required high-end motion picture cameras, involved capturing a great deal of data, and required a custom face rendering system based on our SIGGRAPH 2000 paper. Nonetheless, studios such as Sony Pictures Imageworks achieved some notable virtual actor results using these techniques.

Captured Images:

Our most recent process requires only about fifteen photographs of the face under different lighting conditions as seen to the right to capture the geometry and reflectance of a face.
The photos are taken from a stereo pair of off-the-shelf digital still cameras, and a small enough number of images is required, everything can be captured quickly in
"burst mode" in under three seconds before the images even need to be written to the compact flash cards.

Most of the images are shot with essentially every light in the light stage turned on, but with different gradations of brightness. All of the light stage lights have
linear polarizer film placed on top of them, affixed in a particular pattern of orientations, which lets us measure the specular and subsurface reflectance components
of the face independently by changing the orientation of a polarizer on the camera.

The top two rows show Emily's face under four spherical gradient illumination conditions and then a point-light condition, and all of these top images are cross-polarized
to eliminate the shine from the surface of her skin (her specular component). What's left is the skin-colored "subsurface" reflection, often called the "diffuse"
component: this is light which scatters within the skin enough to become depolarized before re-emerging. The right image is lit by a frontal point-light,
also cross-polarizing the specular reflection.

The middle row shows parallel-polarized images of the face, where the polarizer on the camera is rotated so that the specular reflection returns, and in double strength
compared to the subsurface reflection. We can then see the specular reflection on its own by subtracting the first row of images from the second row.

Separating Subsurface and Specular Reflection:

Here is a closeup of the "diffuse-all" image of Emily. Every light in the light stage is turned on to equal intensity, and the polarizer on the camera is oriented to block the specular reflection from every single one of the polarized LED light sources. Even the highlights of the lights in Emily's eyes are eliminated.

This is about as flat-lit an image of a person's face as you could possibly photograph. And it's almost the perfect image to use as the diffuse texture map for the face if you're building a virtual character. The one problem is that its polluted to some extent by self-shadowing and interreflections, making the concavities around the eyes, under the nose, and between the lips somewhat darker and slightly more color-saturated than they should be. Depending on how you're doing your renderings, this is either a bug or a feature. For real-time rendering, it can actually add to the realism if this effect of "ambient occlusion" is effectively alreaddy "baked in". If new lighting is being simulated on the face using a global illumination technique, then it doesn't make sense to calculate new self-shadowing to modify a texture map that already has self-shadowing present. In this case, you can use the actor's 3D geometry to compute an approximation to the effects of self-shadowing and/or interreflections, and then divide these effects out of the texture image.

This image also shows the makeup dots we put on Emily's face which help us to align the images in the event there is any drift in her position or expression over the fifteen images; they are relatively easy to remove digitally. Emily was extremely good at staying still for the three-second scans and many of her datasets required no motion compensation at all. We have already had some success at acquiring this sort of data in real time using high-speed video [Ma et al. 2008].

This image of Emily is also lit by all of the light stage lights, but the orientation of the polarizer has been turned 90 degrees which allows the specular reflections to return. You can see a sheen of , and the reflections of the lights are now evident in her eyes. In fact, the specular reflection is seen at double the strength of the subsurface (diffuse) reflection, since the polarizer on the camera blocks about half of the unpolarized subsurface reflection.

This image shows the combined effect of specular reflection and subsurface reflection; to model the facial reflectance we would really like to observe the specular reflection all on its own. To do this, we can simply subtract the diffuse-only image from this one.

Taking the difference between the diffuse-only image and the diffuse-plus-specular image yields this image of just the specular reflection of the face. The image is essentially colorless since this light has reflected specularly off the surface of the skin, rather than entering the skin and having its blue and green colors significantly absorbed by skin pigments and blood before reflecting back out.

This image provides a useful starting point for building a digital character's specular intensity map, or "spec map". Essentially, it shows for each pixel the intensity of the specular reflection at that pixel. However, the specular reflection becomes amplified near grazing angles such as at the sides of the face due to the denominator of Fresnel's equations; we generally model and compensated for this effect using Fresnel's equations but also tend to ignore regions of the face at extreme grazing angles. The image also includes some of the effects of "reflection occlusion." The sides of the nose and innermost contour of the lips appear to have no specular reflection since self-shadowing prevents the lights from reflecting in these angles.

Some of our lab's most recent work [Ghosh et al. 2008] has shown that this sort of polarization difference image also contains effects of single scattering, where the light enters the skin but scatters exactly once off some element of the skin before reflecting to the camera. This light picks up some of the skin's melanin color, adding a little color to the image. However, the image is dominated by the specular component, which will allow us to reconstruct high-resolution facial geometry.

Going back to the full set of Emily images, we have subtracted the entire first row from the entire second row to produce a set of specular-only images of the face under different illumination conditions. The images of the face under the gradient illumination conditions will allow us to compute surface orientations per pixel.

Building the Specular Normal Map:

Computing the vector halfway between the reflection vector and the view vector yields a surface normal estimate for the face based on the specular reflection. Here we see the face's normal map visualized in the standard RGB = XYZ color map. The normal map contains detail at the level of skin pores and fine wrinkles.

The four images of the specular reflection under the gradient illumination patterns let us derive a high-resolution normal map for the face. If we look at one pixel across this four-image sequence, its brightness in the X, Y, and Z images divided by its brightness in the fully-illuminated image uniquely encodes the direction of the light stage reflected in that pixel. This tells us the reflection vector for the pixel, and from the camera calibration we also know the view vector.

Deriving High-Resolution Geometry:

LOW-RES GEOMETRY:

The last set of images in the scanning process are a set of color fringe patterns which let us robustly form pixel correspondences between the left and right viewpoints of the face. From these correspondences and the camera calibration, we can triangulate a 3D triangle mesh of Emily's face. However, these images of the face show the subsurface facial reflectance, which originates beneath the surface of the skin and blurs the incident illumination. As a result, the geometry is relatively smooth and misses the skin texture detail that we would like to see in our scans.

HIGH-RES GEOMETRY:

We add in the skin texture detail by essentially embossing the specular normal map onto the 3D mesh.

By doing this, a high-resolution version of the mesh is created and the vertices of each triangle are allowed to move forward and back until they best exhibit the same surface normals as the normal map. Our lab first described this process on the web in some work involving Light Stage 2 back in 2001, though back then we were using normal maps built from the diffuse facial reflection observed in traditional light stage data. The result is a very high-resolution 3D scan, with different skins textures clearly observable in different areas of the face.

Image Metrics planned out thirty-three facial expressions for us to capture Emily in, based loosely on Paul Ekman's Facial Action Coding System. There are a lot of things going on with her mouth and a number of things happening with her eyes - Emily did a great job staying still for all of them. Two of the scans - one with eyes closed and one with eyes open - were acquired from the two sides of the face as well as from the front, as seen in the insets. This allowed us to merge together a 3D model of the face covering from ear to ear.

Building a digital actor from scans of multiple facial expressions is itself a commonly practiced technique - we used it ourselves in 2004 when we scanned actress Jessica Vallot in about 40 facial expressions for our Animated Facial Reflectance Fields project, and going further back, ILM acquired multiple 3D scans of actress Mary Elizabeth Mastrantonio to create the animated water creature in The Abyss.

This particular scan of Emily shows a variety of skin textures on her forehead, cheeks, nose, lips, and chin.

If you click the image to the right, the textures and their variety become even more evident on the rendering of the 3D geometry.

The fourteen images circulating to the left show a sampling of the high-resolution scans taken of Emily in different facial expressions. A lot goes on in a face as it moves!

Observing Dynamic Skin Behavior:

The first scan above (A) shows Emily pulling her mouth to one side, and an interesting pattern of skin buckling develops across the top of her lip. This kind of dynamic behavior would take an especially talented digital artist to model realistically.

Just as dramatic an effect is the stretching of the skin texture on her cheek. The skin pores greatly elongate and become shallower, looking almost nothing like the skin pore texture observed for the same cheek in the neutral scan. This was a skin phenomenon we hadn't observed before, and one that should enhance the realism of virtual characters if it can be reproduced faithfully in a digital character.

Emily's skin pore detail in the neutral scan (B), showing no skin pore elongation - a qualitatively different appearance than the stretched cheek texture in the previous image.

The last scan above (C) has some interesting skin detail as well. Emily was asked to raise her eyebrows, streaching her eyelids over her eyes. The fraction-of-a-millimeter resolution of the scan allowed us to make out the fine capillaries under her eyelid.

When you think of skin wrinkling and buckling, the forehead is one of the first parts of the face that comes to mind. Here's Emily's forehead appearance from a neutral scan.

When Emily raises her eyebrows, there's not a lot of skin buckling due to her relatively elastic skin. But on a fine scale, the skin pores disappear and are replaced with fine lines. On a larger scale, the bridge of her nose narrows significantly compared to the neutral scan.

When Emily lowers her eyebrows, you can see the areas where the muscles attach beneath the skin. The bridge of her nose expands as well.

When Emily was asked to shut her eyes hard - like Masi Oka does in the TV show Heroes before his character teleports - she developed bulging muscles in her forehead, a smoother forehead skin texture, and an even thicker bridge of her nose.

Rendering with Hybrid Normal Maps:

A significant benefit of the photographically-based face scanning process is that we capture perfectly aligned texture maps in addition to the high-resolution 3D geometry. We can in fact do more than visualize Emily's scans as grey-shaded models.

Here is a scan with the diffuse texture maps applied using a lambert material. There is no advanced skin shading or global illumination being performed, so the renderings look chalky and notably unlike skin.

In addition to the diffuse texture map, our scanning process also provides the specular intensity map and a set of normal maps. Part of the specular normal map which we saw earlier is shown to the left.

As it turns out, we can also estimate normals in a similar manner from any one of the color channels of the diffuse reflection of the face as seen above. Since these normal maps are calculated from light which has scattered beneath the surface of the skin, they blur the surface detail compared to the specular normal map. The red channel has the most blue since red light can scatters the furthest within skin, while the blue channel preserves the most detail, but still far less than the specular normal map.

The fact that the diffuse normal maps blur skin detail can be a useful feature rather than a shortcoming, since they essentially measure the 1st-order response of the skin to illumination. In particular we can use all of these normal maps to realistically render the face with a real-time, local shading model called hybrid normal rendering, presented in [Ma et al. 2007].

With hybrid normal rendering, we render the diffuse component of the skin as three different Lambertian cosine lobes, one for each color channel, each driven by the corresponding diffuse normal map and modulated by the diffuse color map. In addition, we render the shine of the skin as a specular lobe driven by the specular normal map and modulated by the specular intensity map. The renderings above use an implementation of hybrid normal map rendering in Maya 8.0, and the technique is almost trivial to implement in a real-time pixel shader. The hybrid normals rendering produces a believeable "skin-like" quality in the renderings, and encodes some of the photometric effects of self-shadowing and interrefelcted light as well. The technique won't produce light bleeding into sharp shadows - that would require subsurface scattering simulation -- but it seems appropriate for most common lighting environments.

Some of our most recent work [Ghosh et al. 2008] also shows how to obtain a per-region specular roughness map and use that for rendering as well.

Scanning Emily's Teeth:

We did one more piece of 3D scanning for the Emily project: a plaster cast of Emily's teeth provided by Image Metrics, adapting our 3D scanning techniques to work with greater accuracy in a smaller scanning volume. Here is a photo of the cast on the left and a rendering of Emily's digital teeth model on the right.

Image Metrics: Building the Animated Face:

The Image Metrics team took the set of our high-resolution face scans and within the course of a few months created a fully rigged, animatable face model for Emily. At the first demo we visited, just a couple of months after delivering our scans, it was amazing to see someone manipulate the animation controls and have the digital Emily's facial rig move into facial positions completely consistent with the scans we had provided. This was no small feat - it required adding digital eyes, rigging the skin around the eyes, adding the teeth, and creating a rig that not only replicated the scans faithfully but also did reasonable things for the infinite variety of intermediate positions Emily's face could produce - especially while speaking!

To demonstrate digital Emily, Image Metrics shot HD video of the real Emily delivering lines of dialog in a lit studio setup. In addition to the master shot, Image Metrics also shot frontal video of the real Emily's performance to supply to their video-based facial animation system. They then used their animation system to generate facial animation curves which would make the digital Emily deliver a faithful rendition of the performance.

Image Metrics also tracked Emily's face in the master shot, set up a shader to render Emily's face using subsurface scattering (hybrid normal rendering would be cool to try next time), and replicated the studio lighting environment using an HDR light probe image of the set and image-based lighting. They then replaced the real Emily's face with a 100% digital Emily face driven by her facial performance, frame for frame, rotoscoping her fingers as necessary when she moved her hands in front of her face. Emily's facial performance was not an easy one to match, with a variety of subtle and extreme expressions and emotions. Nonetheless, the result was a realistic live action version of the digital Emily character which many people found to be entirely convincing, even after several viewings.

The Digital Eye: Paul Debevec In VFXWorld.com, this month's edition of "The Digital Eye," Peter Plantec chats with Paul Debevec about his latest research projects involving realtime 3-D display and capturing and rendering human faces, including a new skin rendering technique.
By Peter Plantec - [ Posted on May 21, 2008 ] [ PNG Version ]

"Emily is a truly monumental achievement, recreating every nuance of human facial expression, even though what you're
actually looking at is the face of a digital actor. Created through a partnership with USC's Institute for Creative
Technologies (ICT), the team's primary objective was to create a completely convincing, animated computer-generated face,
and I think they succeeded." - Paul Strauss, Technabob.com

Leah D'Emilio learns about a whole new approach to visual effects in film. Dr. Paul Debevec, innovator of HDR photography and creator of photogrammetry used in "The Matrix," takes us through USC's Institute for Creative Technologies to explore the future in virtual filmmaking and special effects.