HOW IT WORKS

HOW IT WORKS; Automated Avatars Take the Strain Out of Animation

By MATT LAKE

Published: January 9, 2003

THE next time you watch a movie with a lot of computer-generated special effects or animation, stick around at the end for the credits and wait for the names of the character animators to appear. The list will be huge.

That is because characters like Shrek (from the movie of the same name) and the talking Sorting Hat (in ''Harry Potter and the Sorcerer's Stone'') have to be realistic, especially when they speak, and to achieve that level of realism takes a lot of work.

In movies, characters are meticulously built from scratch using 3-D modeling software -- first creating a skeletal and muscular structure, then adding a layer of padding to simulate body fat and finally a texture layer to imitate skin. The result can be a very realistic-looking figure -- but one contained in a data file that is so overstuffed it takes hours of computer processing time to animate.

Outside of movies, lip-synched animations are also appealing -- as avatars, or graphic icons representing real people online or in other electronic realms. Because of this, several software developers have been working on tools to make animation cheaper, more automated and less of a drain on processing power. The goal is to make animations with so little data that they can easily be sent not just over dial-up Internet connections but even over cellphones.

This kind of animation has become such a hot area that the Moving Picture Experts Group (the multimedia compression group behind the MP3 audio standard) has published a specification for facial and body animation that developers can adhere to. The MPEG-4 working group on face and body animation defined 68 separate features that can be moved to simulate normal facial gestures, including speaking.

The group's chairman, Eric Petajan, a former Bell Labs researcher, emphasizes that in this type of animation, the face and the animation are two separate entities. ''You start with a model,'' said Dr. Petajan, now chief scientist at Face2face Animation of Summit, N.J. , which uses the MPEG-4 standard. ''It doesn't need to look like a person. It could be a talking dog or a talking pile of trash. In MPEG-4, the animation is kept separate from the face itself. Any face can receive an animation stream.''

Pulse Entertainment of San Francisco takes a similar approach with its Veepers product, which is used at entertainment Web sites, in talking electronic greeting cards at Flowgo.com and in animations at Budweiser.com. ''You have a model of the universal face and fit a character's features to it,'' said Young Harvill, a co-founder of the company. ''Take a photograph or texture map and apply it to a face, and then deform it into different facial gestures.''

With this kind of animation, the model that is being animated resides in the computer that plays it -- and that ''computer'' could be a cellphone. The animation instructions as well as sound are delivered or streamed to the player. The combined stream of data is tiny in comparison with streaming digital video. But it is still divided into frames, just as film is, to ensure that the animation and sound sync up properly. ''Every frame tells the model how to deviate from the neutral pose,'' Dr. Petajan said.

Dr. Petajan's company, Face2face, creates its animation instructions by analyzing digital video of real people's performances -- a sophisticated version of the motion-capture techniques used by game developers and other animators. Pulse Entertainment's technology, on the other hand, generates cues for facial gestures from the spoken words themselves.

''We take the energy of the speech -- the number of words, how fast they're spoken and how loud they are,'' Mr. Harvill said, ''and use it to predict how someone might move their head. We synthesize motion from speech.''

When you take the labor-intensive modeling out of animation, it can begin to appear where it hasn't before. An obvious application is in online entertainment sites. Sales and service Web sites can also benefit from a friendly animated face on the screen. Online chat is another possibility, with animated emoticons providing both entertainment and anonymity for the chatters. ''Visual anonymity is important,'' Dr. Petajan said. ''You don't want your kids to be seen on the Internet. Or your bad hair days.''

The animation streams are so small that it is possible that cellphones could include animated faces on-screen, lip-synching to the phone message you are hearing on your headset.

These smaller-scale animation methods might eventually find their way into the movies. Any model can lip-sync and move to an animation stream -- even a model created for a blockbuster film.

''We're big fans of Harry Potter, but the animation for that hat could easily run into hundreds of thousands of dollars in artistic time and effort,'' Mr. Harvill said. ''The trick is to find models that you could refit to another use. Once the major motion picture studios get tired of rebuilding everything from scratch, model fitting is a nice way to go.''

Chart: ''Something in the Way It Moves'' Conventional computer-animation techniques can produce realistic characters and effects, but the amount of data involved can strain computer systems. Some companies have developed a newer type of animation that requires less processing power. In this technique, a 3-D model is modified by a stream of animation cues. The amount of data involved is so small that it is suitable for use over dial-up Internet connections. 1) BUILDING THE FACE Rather than creating a model for each new character, this kind of animation takes a generic face model and wraps a unique face around it by mapping points on a bitmap image, far right, to the model. 2) ANIMATION POINTS Key points on the face are manipulated to create the animation. They are clustered around the mouth, jaw and eyes to make lip-synching and nonverbal expressions as realistic as possible. The MPEG-4 animation standard includes a set of facial parameters, incorporating groups of these points. They are normalized for a neutral face; changing their values moves the points, animating the model. 3) ADDING SOUND The lip-synching process begins with a digital voice recording. The animation software analyzes it and extracts phonemes, the sounds that make up speech, including fricatives (F and V sounds), vowel sounds and plosives (B and P sounds). The software contains a database of facial animation elements, called visemes, that move the lips and jaw to match the phonemes. 4) ADDING EXPRESSIONS Cues for other expressions, like a raised eyebrow or a smile, can be added manually or generated by the software from the voice analysis. Some animation software analyzes video of an actor speaking the lines to generate nonverbal cues. ANGER -- The eyebrows and corners of the mouth are pulled down. HAPPINESS -- The eyebrows and corners of the mouth are raised. 5) PUTTING IT ALL TOGETHER In a typical application, the model would be downloaded to a cellphone, at the start of the animation. Then the soundtrack and animation cues could be streamed to the phone. The cues deform the model at the animation points and the software makes smooth transitions between different expressions. (Source: Face2face Animation)