The Least Mysterious of All Crafts

For one, even if it is well-written and acted, you can often undermine the weight of someone’s words by running away or swinging your sword or throwing a grenade while they are being spoken, unless all control is taken away during the scene. You will see developers stuff you into a tram or some other kind of apparatus that provides a fictional basis for restricting your freedom of moment and action for the duration of the conversation while, at the same time, providing at least a token sense of interactivity (perhaps you can look around, for example). But this solution isn’t all that ideal; besides feeling artificial, it often bores people. Watch someone play during these sections and they’re moving their camera in little circles or zooming in on a guy’s nose or getting a snack.

Secondly, spoken lines in games are often saddled with not just story exposition, which can be clunky in even the best films, but gameplay instruction, too. Imagine a movie that contained not just the background of its fictional premise but tried to work in some hints on how to operate your television as well. No matter how cleverly it is disguised as something happening in the game’s fiction (calibrating your sensors, or whatever) it does not actually fool anyone.

Dialogue is a tool. It has to be, in order to help ameliorate some of the communication problems inherent to games. Designers employ it to direct and inform the player, often to convey objectives, for example– where it struggles to make player instructions like “take out the anti-aircraft guns” or “destroy the generators” interesting. There is no innate interest to lines like these; they are conceived as gameplay objectives, nothing else, and trying to gussy them up just makes them more confusing. The player just wants to know: what do I do next? Get to the extraction point, that’s what. So we often hear those types of lines.

Communication is also the idea behind why your opponents will unwisely shout to each other– “I lost him! Where did he go?”, the instant you crouch behind a crate. The soldiers are stupid to behave this way, you think. But you still want to know if you’re in immediate danger or if it’s safe enough to reload and recover your health. In real life, people who fight each other try not to telegraph anything and neutralize their targets before they have a chance to recognize what is happening to them. But the player of a video game needs this chance to respond and recover, so we place some of the responsibility of creating the chance upon the dialogue.

And in games with frequent combat situations, the dialogue is usually handled by two different pieces of technology: the completely dynamic “barks” or battle chatter, which is triggered in a somewhat randomized fashion in reaction to game events, and scripted mission support dialogue. If you play many games, you have probably heard these two systems step on each other: “Nice job, rookie, let’s get to the warehouse DIE, ASSHOLES! –unlock that door”.

There is a further complication to all of this: spatialization. In games our agents often traverse great distances. A game designer can place a sound file on a game object, and the game’s audio system can situate that sound file inside its model of three-dimensional space. But while it can fade the sound or apply filters to it, it cannot dynamically change the quality of a human voice in intensity or performance.

Our speech in real life is a remarkably complex and variable thing. There is a myriad of factors– distance, context, emotion– that modulate the way we sound. But you cannot record several different versions of a line and fade in between them like you can with engine sounds or gunfire. In the recording booth, actors can pitch their performance to a distant target, but chances are in the game that distance will be quite different than what was imagined.

You will often hear lines of dialogue that come in loud and clear, as if the character who spoke them was right next to you, but he is not actually around, and you have to check your minimap to see where he is– dozens of meters away, or in the next room. Or a character begins saying a line and suddenly takes off, running full-speed ahead of you into the level, her standing around voice trailing into the distance unnaturally. The only easy solution to this is essentially a cop-out: having these voices come at you over the radio, crackling with static even in the future, or the “mystical voice in your head” fantasy equivalent.

Even tightly controlled situations, such as the interactive conversation systems in role-playing games, have their share of challenges. Tom Bissell in Extra Lives describes how actor Jennifer Hale had to read hundreds of lines convincingly but tonally consistent enough to allow a believable conversation to be assembled out of them on the fly. Hale’s performance is remarkable indeed, and for me one of the most enjoyable parts the Mass Effect series. But it has no arc; it simply can’t due to the way the game works. Her tone at hour one is her tone at hour fifty.

Now is the part where it seems obvious to conclude with a look forward to some magical future technology that will come along and solve everything for us. In a decade’s time, maybe we will be able to take actors’ performances and pestle them into a kind of meta-reading of the line, from which we can simply query for the appropriate one: tell the player to set the charges with 5 urgency, 2 annoyance and 2.5 flirtiness. This does not seem out of the range of possibility given what we can do with audio right now. At the same time, though, I can’t help but think that would be unnatural in its own way.