Changing the Face of Motion Capture – Film School, Games and VFX

At work behind the scenes of movies and games, motion capture is one of the most dynamic, competitive industries in media and entertainment. Developers and facilities are continuously innovating to make the process of transferring real performances to the digital world faster, easier and more accurate.

Motion capture systems are now diverse in their approaches, and the equipment is being developed with more intelligence and becomes more accurate year by year. Recently, news has come from several different companies about their innovations that are changing the face and moves of motion capture.

Performance Capture Studio at Vancouver Film School

Vancouver Film School (VFS) and new performance capture company Mimic Performance Capture opened their commercial Performance Capture Studio in Gastown, Vancouver on 18 May 2017. Together they have built one of Canada's largest dedicated studios, measuring 9,056 cubic metres, or 32,000 cubic feet.

The capture volume is built three stories below street level, making it completely sound-proof and unaffected by external vibration, noise and light. It is built with three stunt beams - I-beams equipped with wires, rappel gear for dangerous stunts - spanning the entire space and 40 Vicon Vantage cameras. The studio also has a 5,000-square foot workshop and prop room where props and set pieces can be customised for shoots.

Even before its opening, the facility received bookings from some major gaming and film/TV production companies including Gearbox, Capcom, Waterproof Studios and House of Cool. Mimic’s founder Graham Qually noted that until now, there has been limited opportunity in the Vancouver area for large-scale, up to date motion capture.

Students, Clients and Research

The Performance Capture Studio’s capacity for stunt work, voice work, camera systems, new methods and applied research will also be used to support collaboration between VFS and Mimic on the development and delivery of specialized curriculum and training workshops for programs, staff and students at VFS. Mimic will mentor students on projects involving performance capture, and working with VFS' animation, game, film-based and acting programs to identify best practices and processes for using performance capture at the school.

Following their training, VFS alumni will have first priority for all commercial performance capture employment opportunities at the new studio. In a sense, the students can consider their first day at VFS as their first day on a job.

Mimic has invested in Vicon Vantage cameras, which Graham has been using himself for many years, and Vicon’s new VFX mocap data processing software, Shōgun, only launched at FMX in May 2017. Graham and Jeff Ovadya, Sales Director at Vicon, talked about Vicon’s recent developments in motion capture that now make it faster, more straightforward to use during performance sessions and more accurate.

Automated Calibration

While 40 cameras may not seem like many for such a large capture volume, Graham and Jeff explained that fewer cameras needed now owing to much more frequent, automated calibration and new solving algorithms built into processing software like Shogun. Jeff said, “The result is more solved data with fewer cameras. Calibration and algorithms have also contributed to better handling of actor interactions.

"Earlier on, the integrity of every frame of the captured performance relied on the position of the marker. Calibration is essentially about proportion, which can change relative to the camera quite easily. Because calibration could only be done once, if the marker moved, was covered up or the actor adopted a challenging pose like lying flat on the ground or huddling up or was partly obscured, the integrity was compromised.

"Therefore the more often you can calibrate, the better the solve because the motion happens much sooner after a calibration. Now, automated calibration takes less than a minute without interrupting the actor’s work. The computer can keep track of the markers even if occluded because the relationship with other markers is newly established and the algorithms allow, if necessary, very accurate extrapolation of missing marker positions."

For high action feature films, real-time solving for visualisation through a game engine is often considered a critical step because it promptly shows the actor what he is doing, and keeps the director in control while still on set. For these reasons, Disney and ILM have been contributing to Vicon’s research.

Self-Aware Cameras

Vicon Vantage cameras have an intelligent, self-aware design to work within an animation and film production workflow while they record precise motion data and practical information. Each one’s sensors continuously monitor its own performance, which will give Mimic’s staff visual feedback through an on-board camera display, in the software and on Vicon’s Control app for mobile or tablet devices.

Control connects wirelessly to Vicon’s software, meanwhile streaming camera data to your device. So, as Mimic monitors sessions with Shōgun, the team receives real-time metrics on camera and system health. The team can recalibrate cameras that have fallen or need to be repositioned in a few minutes, while continuing to capture performances. Labelling and solving remain true.

Real-time SHOGUN

The main characteristics that stand out in Shogun are its real-time performance, ability to capture multiple actors undertaking complex interactions such as folding arms, hugs and stunt work with props, and direct support for the major real-time game engines including Unreal, Unity, CryEngine, Havok (Microsoft) and others. Shōgun also records data direct to disk, which makes onset review and direct visualization of the final scene almost instantaneous. The time-saving for post-production is considerable and will allow Mimic to think ahead about the remaining pipeline instead of spending a lot of time checking up on what has already been recorded.

Shōgun starts labelling actors automatically as soon as they enter a T-pose, and then processes each frame of the range of motion as it is captured, dynamically calibrating the subject in the background and resulting in a proportionally accurate skeleton. As they monitor the session on screen, users see a mesh laid over each actor, unique to that individual. It incorporates details about the markers and visualises the solving quality, and is precisely scaled to fit to each actor during live subject calibration.

The labelling and solving inside Shōgun represents a substantial change in what a motion capture platform can do in real time, as Jeff explained above. Complex scenes like fight sequences and stunt work involving multiple actors become more feasible due to this level of real-time performance.

The fact that all data is being written in real time directly to disk during sessions is also significant because the data no longer has to be fully re-processed after capture in order to make an accurate assessment of a session’s success. The data can be reprocessed from raw, but depending on the results, this step may not be necessary.

DI4D Faces the Future

In February 2017, Dimensional Imaging secured a £900k investment from Percipient Capital and existing shareholders. DI4D is one of the VFX industry’s top providers of facial performance capture for movies, video games and VR, as well as medical and scientific research.

At the time of the investment announcement, the company’s CEO Dr Colin Urquhartspoke to Digital Media World about some of their activities and events in the entertainment industry that led to the funding. Just recently, we contacted him again to find out about the company might use those funds, especially in response to greater interest in their facial performance capture for movie VFX and video games.

“Both applications continue to push for more realistic facial animation, and to develop new facial capture pipelines. In many ways, marker based systems appear to have reached the limit of fidelity they can deliver, but audiences still expect greater realism with each new game title or film,” said Colin.

Pixel Recognition

“Tracking markers are too sparse to give dense enough data to meet demands now, and facial rigs cannot adequately compensate for gaps in the data. DI4D’s systems track pixels instead. Based on photogrammetry, they record the position of each pixel in images from cameras positioned at different angles to the subject - a stereo pair of images captured with a calibrated pair of cameras is processed automatically to derive a dense 3D model, calculating a point in 3D space for every foreground pixel."

By recognizing each pixel from each image very accurately and using the relative position data to locate the pixels in 3D space, the system can build up an accurate 3D map of the subject. The ‘4D’ in the company's name DI4D refers to the addition of motion or a time element to the capture of 3D dimensional motion by introducing moving cameras.

Up until the present, DI4D’s processing software has focussed on generating 3D models from their dense capture, and on tracking. Colin mentioned that some of their funding could be used to develop software for solving data to rigs for customers.

Face and Body

Colin said, “Regarding animation rig development, we had a fair amount of communication with animators at Remedy Entertainment, who licensed Dimensional Imaging's 4D facial performance capture software when developing their Xbox One title Quantum Break. We learned from that experience, but a team like Blur Studio has a well-defined workflow and pipeline and carried on with the data pretty much on their own.

“The ideal is to capture facial data during the full body performance capture instead of dedicated, sit-down capture sessions. Obviously, the full body scenario gives a more natural result. For that reason, Technoprops and DI4D collaborated in 2016 on a stereo 2K helmet mounted camera system, the DI4D HMC System, that is compatible with DI4D’s facial performance capture software.”

When they worked with Blur on two trailers for Halo Wars 2, both the official one launched at E3 2016 and also the origin story of Atriox. The DI4D HMC system was used in the Atriox trailer to capture some of the facial performance of the character Isabelle.

Scanning Pioneers

After launching in 2003, Digital Imaging became a pioneer in scanning techniques when they developed their DI3D system of cameras arrays to scan and capture highly accurate 3D facial likenesses. The process of capturing human likenesses extremely close to real life in order to create believable virtual characters has traditionally been very time consuming. As one of the first developers to recognize the need for improved capture workflows, EA became a major customer and worked with Dimensional Imaging to develop a fast, very accurate capture system. They used the DI3D system for athletes in the FIFA game series from 2009.

Scanning with camera arrays is now a standard technique for data capture, of course. The DI3D facial image capture is a passive stereo photogrammetry-based system that outputs accurate, extremely high resolution, full colour 3D surface images using standard digital stills cameras. It can record scans at up to 60fps with a moving camera.

Synchronised video cameras are used to capture sequences of stereo pairs of images, which are processed to automatically output a sequence of 3D models. Multiple stereo pairs of cameras can be used to capture multiple overlapping 3D models, which are then merged automatically into a single 3D model per frame. If colour cameras are used, then a per frame colour texture map can be applied to the 3D model sequence.

Innovation

As well as calculating these 3D models per frame, the DI4D Processing Software also calculates an optical flow map for each image to the previous and subsequent image in the sequence. This dense optical flow data is used in the DI4D Tracking Software to track landmarks, or the vertices of a fixed topology mesh, through a sequence.

DI4D has always been an innovative company, that is, not remained static or isolated but helped partners and clients solve project-related problems and take advantage of new equipment. For example, they continue seeking image processing software improvements and higher digital camera resolution. Colin said, “When we started, they were capturing stills at 6MP and video at 1MP. Now we just keep buying the best cameras. We have also doubled the size of their team since receiving the funding. This is a transition period for motion capture, a time for significant change.” DI4D wants to be a driver of this change.

Going Live with Faceware Technologies

Faceware Technologies is another very different facial capture software developer. The company launched an Interactivedivision just before SIGGRAPH in 2016, and since then has released Faceware Live real-time facial motion capture and animation software, which has now just been upgraded to version 2.5. Faceware Live produces and re-targets facial animation interactively by automatically tracking a performer’s faceand instantly applying their performance to a facial model in Autodesk MotionBuilder or the Unity or Unreal engine display environment.

As a live system, Faceware Live works in a much simpler way than DI4D’s pixel recognition and photogrammetric 3D surface capture, but still does not use markers. It only requires one camera for tracking, which can range from an onboard computer video or webcam to Faceware’s proprietary GoPro or Pro HD Headcam Systems, and most other video capture devices.

Peter Busch, the vice president of business development at Faceware Technologies, said that due to the wider use and on-going development of computer graphics, virtual, augmented and mixed-reality, and graphics processing, the company has seen more interest in live CG performances and the ability to drive digital characters in real time. “To deliver this kind of service, the user needs completely stable face tracking that can track facial movements across a range of different conditions,” he said.

Version 2.5 can detect different types of faces with different skin tones, heavy facial hair, glasses and so on in different lighting conditions. Once the main facial features have been detected and calibrated, the software captures 180 degrees of motion with less jitter than its previous version.

Under Control

Animators can tune the live streaming animation to emphasise or limit a specific motion, such as raising an eyebrow or smiling. These fine tuning adjustments are done pose by pose in real time and, like all settings, can be saved to a profile. Animators are still able to isolate certain controls to work on areas of the face that need more attention.

Faceware Live Server is the realtime facial tracking software that streams live tracking data from a camera or image sequence into one of Faceware Live’s Clients – that is, Unreal Engine 4, Motionbuilder or Unity - to drive live animation. Most important and before any tracking can start, as mentioned, Live Server calibrates the distinctive features of the actor’s face to optimise the tracking quality for a given video or image sequence. It only takes about one second.

Live is actually quite flexible. For example, instead of tracking along with live video, you can use a sequence of images as a way to gain more control, or trigger the calibration through the software’s command-line to partly automate capture set-ups.

Animators can also simulate data without relying on a live camera feed to help them set up their characters. Live 2.5 now includes animation preview characters that can be used to determine how certain types of motions might look on various facial structures. Depending on the results, this means an animator can adjust the virtual character to make the most of the live data. This functionality is reassuring when you want to re-target data to a creature or to a character that is much older or younger than the actor.