Into The Nitty Gritty Of Performance Capture With Tintin’s Joe Letteri

Today’s the official UK release day of Spielberg’s The Adventures of Tintin: The Secret of the Unicorn. We’re finally at the end of the boy reporter’s very long journey to the silver screen.

Though he intended to make a Tintin movie for decades, and he’s a renowned traditionalist, Spielberg finally plumped to realise this picture with the relatively modern tools of performance capture. To try and understand why, I think it might be good to know as much about motion and performance capture as possible.

Therefore, when I sat down with the film’s VFX supervisor Joe Letteri this week, I was on a mission to get the key facts spelled out, and the misconceptions tidied away. Seeing as Letteri is the director of Weta Digital and an alum of ILM, going right back to their pioneering work on James Cameron’s The Abyss; and seeing as he’s a firm believer in the potential of motion and performance capture, there really wasn’t anybody better to give me a state of the art address on this medium.

So here, then, in the words of Joe Letteri, is a lession in performance capture.

Animation vs. performance

The way I work, I always consider anything that is moving as animation. That’s not essentially a technical definition but I never thought of [performance capture] as anything different. Since we started coming up with ideas of motion capture and performance capture, back when we were working with Gollum, to me, they’ve all been animation techniques.

We had the argument back on Gollum – should we have animators sitting and animating everything, or should we try to record the performance? Well, even still, the animators would still rely on computers.

It’s not like it was in the 30s where you’d do key frames and then somebody would go in and draw the in-betweens, you’re using some algorithm set by some mathematician you probably don’t even know to determine what your in-betweens will be. So there’s no such thing as pure animation in CG. In fact, probably the closest you would get to pure animation these days is something like stop motion, the only thing where every frame is set by somebody by hand.

Go back to any of the old Disney films where they used rotoscoping, which was essentially trying to capture the actor’s movement because it’s realistic and it’s very clear when you’ve got it right or not. I say “What’s wrong with using actors?” I like using actors. They bring a real spontaneity to it and that, to me, translates to reality.

Creating faces

When you’re on a stage where you’re doing the body it’s actually pretty straightforward, right? It’s all biomechanical. You put a marker here and a marker here and say “right, that’s 15 degrees” but on your face, there are no fixed points of reference so it really is infinitely variable, especially when you add dialogue in or, say, somebody falling off a cliff screaming where their face is fluttering. You can’t really account for every variation ahead of time.

But there are probably around a thousand or so key expressions [for each character] that we lock into and then we can work around those.

We look at what is going on with the live action performance, and we try to teach the computer to understand what the actor’s intent was and how to translate that to the character. If the actor would give you exactly the same expressions [as we modelled ahead of time], the computer can give you exactly the same expression back on the character. But the fact is that the human face is infinitely flexible, so that in fact never happens.

But it can get close, and we get the computer to give us the best answer that it knows. But then animators go in and work out those expressions. Even the ones that computer starts out with, because it needs something to start out with, were done by animators. This is all done by an animation and a modelling team that makes all of these things and say “This is Tintin smiling, this is Tintin sad.” It’s all really artistically driven and we use the computer as a way to learn these so that once we figure them all out, we don’t have to go back and create it all again by hand.

The state of the art

The way we captured this movie was a little bit beyond what we did for Avatar, but it was essentially the same thing. There are a few things that we did refinements on. What we use to capture the data is actually pretty robust and pretty accurate. But you still have to use artists in there. We say “Let’s not overdo it. Let’s get enough that the artists need and then we take it from there.”

If you really want to get down to it, you’re really just talking about a technical distinction about what your input tool is. Is it a system of cameras that is recording your movements or is it a mouse that is recording them?

We’ve got a good basic understanding of a lot of the physical phenomena now, so if you were to put a lot of research effort anywhere it would be into faces. That’s still the most unknown. They’re the thing that we, as human beings know the best, but it’s really not very well understood how the face does what it does and what makes the expressions. I would keep working on the expressions and the faces. I know we can do something even more realistic.