Overwatch: Play by Sound

Overwatch is my personal favourite game right now, so I just had to cover it in some aspect or another. A fantastic talk at the Game Developers Conference 2016 called “Overwatch – The Elusive Goal: Play by Sound ” by Scott Lawlor and Thomas Neumann,Is what I will use mostly use to explain the technical side of implementation in this article.

(along with my 400+ play time of the game)

Scott Lawlor and Thomas Neumann talk about how at the early stages of design they where given the task of “being able to play with sound alone” by the Game Director Jeff Kaplan, and how much the sound design should give as much relevant information to the player as possible.

They outline the main goals for the audio team into 5 categories or “pillars” of importance;

Pillars:

A Clear Mix

Pinpoint Accuracy

Gameplay Information

Informative Hero Voice Over

Pavlovian response

Each of these topics deserves its own in depth discussion, but i would like to focus mainly on the mixing system and the Gameplay information as these 2 link to each other quite well

Overwatch is a First Person Shooter based around two teams of six players.

All the game modes are fairly similar from a sound point of view, so this shouldn’t change anything in the overall discussion. The most intense moments of action would mean there are 12+ sounds happening all at once in a very small area of combat. This without the appropriate mixing would sound overwhelming and cause the pillar for “A Clear Mix” to fail at these moments, which are very common whilst playing the game.

The mixing process for most games is all done either in engine or through a middleware piece of software such as Audio Kinetics Wwise™. Custom software can also be developed to fit a specific need for any area of a game, and the audio team developed a few good examples to explore further.

The first is a simple program that relates a specific piece of audio to an assigned “importance” value ranging from 0 – 120, 120 being most important. Pictured above is an example of what maybe considered when assigning these values in real time. This data is further organised by the software into 4 different “buckets” of importance depending on their number value:
High/Medium/Low/Cull. The example below is which of these 4 buckets is the sound currently,
The character and the importance number. Then how many sounds are put into these buckets.

This slideshow requires JavaScript.

The reason the developers decided to group the audio into these buckets is because without a clear winner to what is most important, the mix would be muddied by too many important things happening at one time, so they limit this by having only 1 high priority sound at any one time, and the rest as follows in the other buckets.

This bucket value is what data is actually sent to Wwise, to then adjust volume and filtering parameters accordingly.

A Real Time Parameter Control is what is used in Wwise to change the values of makeup gain on the sound depending on which of the 4 buckets the sound has been placed in.

each category of sound in the game will have its own specific values for how much or little gain will be applied to the sound in game.

An Ultimate ability for example will be something that when activated will have the highest importance. this is true for both friendly and enemy Ultimate ability usage.

A clever way of differentiating whether the current ability being used is friendly or not, is to have two different voice lines that players hear, one team will hear one distinct line and the other team will hear the other distinct voice line. An example

When the character Lucio uses his Ult (Ultimate ability);
allies hear “OH, Lets break it down!”

enemy will hear “drop the beat!”

allied lines will still be quieter than enemy lines for the reasons above but they are both very high on importance.
There are 23 different heroes/playable characters in the game and there can be the same character used on both teams at the same time so this simple difference will help the player understand what is going on in the fight with audio cues alone.

Of these heroes, 6 of them speak a different language to English from all across the globe, and Bastion the Omnic (robots in the Overwatch Universe) who just talk in a series of beeps and boops (think Wall-E for an example)
they use this idea like above and say their voice line in e.g. French from the enemy perspective and English for the ally players.

” Pavlovian conditioning: A method to cause a reflex response or behaviour by training with repetitive action.”

This example used supports the pillar for a Pavlovian response in the player as these voice lines are consistent every time you hear them so they are quickly learned by the player, which is very important as Ult usage is key to winning games in Overwatch, hence the need to clearly communicate every time.

The idea of having repetitive sounds in gaming these days is usually thought of as a bad idea, as these sounds may become annoying or seen as lazy development. The developers felt it was more important that the players could instantly know what something is just by listening for a fraction of a second than confuse players.

But repetition isn’t the only way to teach players the sounds in game.
Each hero is very different in all aspects be it, nationality, age, race, size, weapon type and materials worn on their feet. This diversity in characters attributes helps the player discern who it is walking up behind them, ready to strike from an audio perspective. This is very important information for winning a fight as you maybe dying or getting the upper hand depending on audio alone. This example holds true for a lot of online First Person Shooter games, but here, you know which specific character will be walking round that corner and whether it is even a winnable fight.

This gameplay mechanic can be counter played by crouch walking to mute the movement sounds of your player at the expense of going a lot slower than walking speed. Again a common gameplay mechanic in other titles but worth noting.
The ability to know who is crucial, but what about where they are in the map?

Instead of using the standard occlusion settings in Wwise, the team customised a ray tracing technique from another part of the AI path finding for the hero Pharah (who flies around the maps). this means the rays cast for the audio can go around corners from the source to the listener and measures the distance to determine volume instead of going through walls in a straight line. Filtering the audio with a high pass filter when behind walls and large obstacles as needed.

The team also developed a custom quad delay for use in giving realistic tails to the sounds heard in game. these 4 delay channels are assigned to the 4 surround sound speakers if the user has the setup. 4 rays are cast every frame in these 4 45 degree angles, the distance measured in in game meters, directly correlates to 1 millisecond of delay time for that specific channel of audio. This technique is usefully for understanding the space you are in whilst playing and helping with the realistic and immersive feel of the game world too.

Overwatch also supports Dolby Atmos speaker setups which is 9.1.2
this support will enable the player to hear directly above and below them in addition to a 9.1 speaker array. This technology at the time of writing is relatively new and unexplored to its fullest in terms of gaming. I’m sure it would be a very immersive experience, they do support Dolby Atmos for stereo headphones also which is just fine. It should be said that this is the best option for listening to the game with standard equipment for that better stereo field differentiation.

To summarise

points I haven’t focused on are the Music, other more standard User Interface sounds and hero dialogue.These help communicate a lot of additional information about objectives, Ultimate ability readieness and other helpful features. The dialogue system also automatically and random chooses some context sensitive hero to hero story/lore dialogue. This brings a sense of fun and world building/immersion in the universe that is great to hear at the start of rounds (when you are basically just waiting for 1 minute for the action to start). It helps fill these gaps in the action nicely. thats all for now thank you for reading. Calum Grant