Viewpoint

Noriko Kurachi

Computer graphics, like other scientific fields, has made great strides, thanks to landmark developments at various times throughout its history. Established on innovative discoveries made during the 1970s and 1980s, the field of CG was itself something of a revolution. Since its establishment, CG has made steady progress toward producing ever more interesting graphics, created with ever greater efficiency.

Computer graphics research is conducted in two realms: academic and industrial. The academic perspective tends to place priority on new theoretical concepts, while the industrial perspective focuses on applying the theories in production. These two perspectives require different creativities, and the progress of CG technologies relies on the collaboration between them. Here we examine some of the landmark discoveries made by the interplay of these two branches, as well as take a look at the road to where they might lead.

In most cases, revolutionary developments in computer graphics begin with experimental work that attempts, from a physically based point of view, to bridge some new component with graphics. An example of this is the bridging of photographs and graphics, which later came to be called “image-based” technologies. Computer graphics traditionally needed lengthy computation times in order to produce a high-level photorealistic look that is truly comparable with (and even indistinguishable from) photographs. The essential idea of image-based technologies is to substitute such computation with photographs by directly using them for image synthesis. This approach began with the recovery of 3D geometry and textures of objects from photographs, followed by the recovery of real-world lighting from photographs, as detailed in the paper “Rendering Synthetic Object into Real Scenes: Bridging Traditional and Image-Based Graphics with Global Illumination and High Dynamic Range Photography” by Paul Debevec (SIGGRAPH 1998). The basic theories that support image-based modeling and rendering were already popular in other fields of computer science, such as computer vision; however, image-based lighting was the grand, new invention in the field of CG. This development was supported by an earlier invention called High Dynamic Range Imaging (HDRI).

In photographs taken with an ordinary digital camera, the pixel values that exceed a specific threshold are clamped; therefore, the brighter regions of the photographs often fail to represent the physically accurate brightness of the scene. HDRI is a technique that converts ordinary photographs to special photographs in which every pixel value indicates physically accurate brightness. In the converted photographs, each pixel holds exactly the same brightness as that in the real scene. Therefore, in image-based lighting, the pixels of the background plate act as if they were the light sources scattered in the background scene, thereby accurately approximating the lighting from the environment.

HDRI enabled the use of photographs to synthesize new images—in essence, leading to the true bridging of photographs and computer graphics. Image-based lighting was the breakthrough that really leveraged HDRI, and it was followed by other image-based techniques that switched to HDRI to produce a photorealistic look much more efficiently than traditional, physically based rendering methods, which require complicated computation to simulate lighting behavior.

This revolution in CG, which took place from the mid-1990s to 2000, happened to coincide with the period when computer graphics began to dominate visual effects in Hollywood films. VFX vendors came to feel the necessity of producing a higher level of photorealism in a large amount of shots. Image-based technologies filled this demand. The result was well received, and rapidly spread in filmmaking. The most well known example of this adaptation is The Matrix series, but there are many Hollywood movies wherein image-based technologies were used. This phenomenon resulted in the revision of the original image-based technologies from a practical viewpoint to support the progress of image-based technologies from both the academic and industrial perspectives.

Photorealism Comes of Age

As we entered the 21st century, various kinds of “bridging” appeared, with the purpose of producing a super-photorealistic look of the specific object. This movement was driven by the requirements of the industry, where humanlike characters appear en masse in video games and films. As a result, research related to human expressions began to flourish. For example, in the case of skin or hair, realism is largely affected by the behavior of light, which repeatedly scatters inside skin layers or hair volume. So by incorporating research done in medical optics or cosmetics, CG developed new and efficient physically based models that approximate this complex light behavior.

The modern architecture of hardware rendering, focused on the GPU, allowed the technology applications to run faster. However, in order to increase the realism that physically based models produce,

researchers added to the process methods that aimed to derive the model parameters by analyzing photographs of a real human’s skin and hair. Beginning in late 2008, physically based rendering methods became remarkably prevalent in the visual effects field and attracted attention to human expression. Because it is difficult to create such details purely using rules-based methods, techniques that measure detailed realism from photographs were considered more heavily.

Image-based technologies contributed to the recovery of the global appearance of digital humans, as well. Consider LightStage, a system in which a set of point-light sources (which can be switched “on” and “off”) are placed along a uniform grid over a dome, and a human sitting in the center of the dome is illuminated by these light sources. A set of photographs taken under each lighting environment (lit by each light source) enables us to capture how the person’s appearance changes according to the changes in lighting and viewing direction. This information enables the reproduction of the subject’s appearance from an arbitrary viewing direction under arbitrary lighting conditions. Starting with Spider-Man 2 (2004), this system came to be used in a variety of film projects.

Another example is a method called “video motion capture,” which can be thought of as an extension of image-based modeling. Image-based modeling recovers 3D geometry of an object from multiple photo­graphs taken by multiple still cameras from different camera angles. Video motion capture uses multiple video cameras that shoot the performer from different viewing directions; it then recovers the 3D pose of the performer at each frame of the video. Compared to traditional optical or magnetic motion-capture methods, video motion capture has fewer limitations regarding the capture environment (space, costumes, and so forth).

Starting with Arthur and the Invisibles (2006), video motion capture became actively adopted in Hollywood film projects. From the academic perspective, as seen in recent SIGGRAPH paper sessions (especially since 2008), many ideas for capturing human appearance and motion using image-based approaches are published each year. This indicates that research in this direction is becoming the trend in image-based technologies.

Beauty Factors: Skin and Hair

Skin and hair are significant elements in the creation of realistic and believable expressions of human characters. Key to producing a higher degree of realism in these elements is knowing how to accurately and efficiently compute the scattering of light inside of skin and hair—a process known as “volume scattering.”

A physically based representation of volume scattering is described by a radiative transfer equation. Solving this equation is computationally expensive, and as a result, various methods that attempted to approximate it emerged and found their way into film and game projects.

However, the radiative transfer equation that had been used stands on the assumption that the structure of the scattering media can be approximated by spherical particles, whereby the behavior of the scattering media is independent of the direction of the light propagation (that is, it looks the same to light traveling in different directions). This assumption limits the physical structure of the scattering media to being isotropic, whereas materials such as skin, hair, and cloth have an anisotropic scattering structure. At SIGGRAPH 2010, Wenzel Jakob presented the paper “A Radiative Transfer Framework for Rendering Materials with Anisotropic Structure,” which aimed at removing such a limitation. This paper generalized a radiative transfer equation derived from oriented, non-spherical particles, and then proposed a new volume-scattering model that can represent the anisotropic structure of scattering media with physical accuracy. The technique may greatly increase the realistic appearance of skin, hair, cloth, and other important volumetric or translucent materials that have an anisotropic structure.

At SIGGRAPH Asia 2009, two methods that attempted to recover the details of hair using image-based approaches were published. One proposed a methodology for capturing the small-scale structure of real hair from photo­graphs; the other attempted to recover the color of real hair using the data acquired from photographs.

Now Hear This

So far we have looked at bridging photographs and computer graphics, especially those techniques that have been important to the film and game industries. Now let’s turn our attention to a new bridging technology that may lead to a remarkable contribution in the future: the bridging of sound and graphics, often called “sound rendering.” Such projects are still in the experimental stages, and no practical implementations from an industrial perspective have yet been seen. But the potential is apparent judging from recent SIGGRAPHs, particularly at the 2009 conference, where a technique perfectly synchronized the physically based sound of water with the animation created by a fluid simulation. This application (“Harmonic Fluids” by Changxi Zheng and Doug L. James) attracted strong interest from a wide range of people, and it was followed by the synthesis of fracture sounds (“Rigid-body Fracture Sound with Precomputed Soundbanks,” also from Zheng and James), as well as other impressive, physically based sound rendering methods that appeared at this year’s SIGGRAPH.

Sound rendering itself has a long history. The term seems to have originated in the paper “Sound Rendering,” released in 1992. The paper proposed a general framework to adequately synchronize sound (recorded or synthesized) with CG animation (hand-drawn or procedural). Until then, most of the sound works were targeting fields such as virtual reality by providing a realistic virtual environment; therefore, the paper’s concept can be thought of as being new in the sense that it put the first priority on the synchronization of sound and CG animation. It also suggested that ideas in CG rendering (such as raytracing) would be useful in sound synthesis because of the analogy between light and sound, which although recognized in physics, had not yet been emphasized in CG.

CG needed to wait for the dawn of a new century before it could welcome physically based sound synthesis approaches. In 2001, two early, physically based sound-rendering methods were published at SIGGRAPH. One method attempted to synthesize the sound produced by deformable solid objects through the simulation analysis of a physically based deformation (“Synthesizing Sounds from Physically Based Motion” by James O’Brien, Perry Cook, and Georg Essl). In 2000, a course presentation related to audio synthesis was held at SIGGRAPH, bringing together researchers in the fields of audio synthesis and computer graphics and spawning the work that resulted in 2001.

Another method aimed at synthesizing rigid-body impact sounds to make them act as a digital sound foley (“Foley Automatic: Physically-based Sound Effects for Interactive Simulation and Animation”) was presented by Kees van den Doel, Paul Kry, and Dinesh Pai at SIGGRAPH 2001. Impact sounds were driven by contact forces, which were computed using dynamic simulations or physically based procedural methods. Introducing the concept of modal synthesis, impact sounds were modeled by a set of frequency-dependent bases that contributed to generating a sound foley in real time. These works were followed by a method that enabled the synthesis of aerodynamic sound caused by vortices, such as those created behind a stick-like object placed in a flow (“Real-time Rendering of Aerodynamic Sound using Sound Textures based on Computational Fluid Dynamics” by Yoshinori Dobashi, Tsuyoshi Yamamoto, and Tomoyuki Nishita at SIGGRAPH 2003). Physical information around these vortices was computed first using fluid simulation, which was recorded in textures and used at run-time to synthesize the sound in real time.

All the above works successfully introduced physically based insights into the sound-modeling process; however, the process by which sound radiated from vibrating surfaces was often ignored or approximated using simple formulas that had large limitations, such as ignoring important wave diffraction effects. The solution for this problem was provided in “Precomputed Acoustic Transfer” (by Doug L. James, Jernej Barbic, and Dinesh K. Pai and published at SIGGRAPH 2006). The method aimed to accurately simulate the sound radiating from vibrating rigid objects. Essentially, simulating sound radiation requires solving the wave equation, which is a costly computation. Therefore, the method introduced virtual sound sources (called multipoles), which approximate the solution of such complicated equations. The sound produced by multipoles can be represented by very simple functions, and once those virtual sound sources are placed during a pre-processing stage, the computation during run-time sound rendering is just the summation of these simple functions—which can be done at real-time rates.

As this is a very generic approach, it applies to a large variety of phenomena where sound radiates from vibration surfaces. In fact, it was an important contribution to the computation of the sound radiation of water (where sound radiates over two different fluid layers: water and air) and fractures (where the topology of the surface dramatically changes), which emerged later.

Synthesizing water sounds was especially challenging because it required a number of breakthroughs, such as learning to accurately and efficiently compute the complex sound radiation over the interface of the water and air. In one of these breakthrough developments, the tiny air particles in water, which act as oscillators, called “acoustic bubbles,” were used as water-sound sources, and radiations of sound from these sources were computed in parallel. Sound radiation in the water domain was computed first, and then sound radiation in the air domain was computed so that the normal velocity of the sound wave could match with that already computed in the water domain on the interface.

Again, virtual sound sources, such as multipoles, were used to approximate the sound waves. What’s interesting is that all these procedures were included in one fluid sim that produces the graphics. It enabled a much higher level of synchronization compared to those from previous sound-rendering works. The graphics generated using a cutting-edge fluid simulation were incredibly realistic, but the accompanying sound, which was produced at each step by algorithms working together with those that were producing the graphics, greatly increased the realism and believability of the animation. It was the beauty of physically based synchronization, and it strongly impressed the audience.

The work on fracture sounds included another challenge. The progress of sound rendering had focused on making the sound produced by one object or phenomena more realistic, but when the number of sound-producing objects is increased, there are all kinds of added challenges. The first challenge was to synthesize fracture sounds that start with one rigid-body sound model and then rapidly produce a large number of sound models in unpredictable ways. Even after this problem was solved, there still remained more complex cases, such as coupling multiple types of objects and phenomena (fluid, solid, and so forth). Ultimately, the challenge is to satisfactorily synchronize sound and graphics in any arbitrary scene.

The series of sound works, such as water and fracture, were planned as part of a project called “Sound Rendering” at Cornell University. The ensuing feedback from various industry sectors led the researchers to believe that traditional methodologies, such as using recorded sounds, are still prevalent and that the revolution needs time. However, sound rendering is evolving rapidly, as we have seen, whereas what people expect for computer graphics and industry is also steadily changing. Therefore, the process could be viewed as a journey until the bridging of sound and graphics can meet with fruitful application results.

CG researchers always have been enthusiastic when it comes to making magical phenomena happen. Many magical scenes we see in films and games today were brought about by image-based technologies that started with the concept of bridging photographs and CG. It will be exciting to see what kind of magic can be achieved in the future through undiscovered methods of bridging technologies.