We present a learning-based approach for synthesizing
facial geometry at medium and fine scales from diffusely-lit
facial texture maps. When applied to an image sequence,
the synthesized detail is temporally coherent. Unlike current
state-of-the-art methods which assume ”dark
is deep”, our model is trained with measured facial detail
collected using polarized gradient illumination in a
Light Stage. This enables us to produce plausible
facial detail across the entire face, including where previous
approaches may incorrectly interpret dark features as
concavities such as at moles, hair stubble, and occluded
pores. Instead of directly inferring 3D geometry, we propose
to encode fine details in high-resolution displacement
maps which are learned through a hybrid network adopting
the state-of-the-art image-to-image translation network
and super resolution network. To effectively capture
geometric detail at both mid- and high frequencies, we
factorize the learning into two separate sub-networks, enabling
the full range of facial detail to be modeled. Results
from our learning-based approach compare favorably
with a high-quality active facial scanhening technique, and
require only a single passive lighting condition without a
complex scanning setup.

Experimental Results

We evaluate the effectiveness of our approach on different
input textures with a variety of subjects and expressions.
We show the synthesized geometries embossed
by only medium-scale details, 1K and 4K combined
multi-scale (both medium and high frequency) displacement
maps, with the the input textures and base mesh
shown in the first and second column, respectively. As seen
from the results, our method can faithfully capture both
the medium and fine scale geometries. The final geometry
synthesized using the 4K displacement map exhibits mesoscale
eometry on par with active facial scanning. None of
these subjects are used in training the network, and show the
the robustness of our method to a variety of texture qualities,
expressions, gender, and ages. We validate the effectiveness of geometry detail separation
by comparing with an alternative solution which does not decouple middle and high frequencies.
The displacement map learned from the alternative
method fails to capture almost all the high frequency
details while introducing artifacts in middle frequencies,
which is manifested in the embossed geometry.
Our method, on the other hand, faithfully replicates both
medium and fine scale details in the resulting displacement map.

We also assess the effectiveness of the proposed superresolution
network in our framework. The reconstructed result using supersolution
network outperforms its opponent significantly in
faithfully replicating mesocopic facial structures.

About VGL
The ICT Vision & Graphics Laboratory develops new techniques for creating and displaying photorealistic computer graphics of people, objects, and environments.
We specialize in developing image-based methods for acquiring shape, reflectance, and motion from digital photography and video.
The results are computer-generated virtual models which look and behave as realistically as possible, viewable from any viewpoint and in any illumination condition.