Spectral Analysis: Interview with Jeff Macpherson

Thanks to Brad Dyck for contributing this interview.

For the past two decades, Electronic Arts’ FIFA franchise has been the most successful selling sports video game series of all time and Audio Director Jeff Macpherson has been overseeing the sound of the games since 2006.

Creating a smooth, realistic commentary system along with dynamic, exciting crowds is no small feat and I recently spoke at length with him about just how this is accomplished. Join us below as we discuss the challenges and satisfaction that comes with designing audio for a world class sports title.

THE TEMPORAL SHIFT

BD: How long have you been Audio Director for FIFA?

JM: I’ve been on FIFA since 2001 so it’s been 12 years since I’ve been on FIFA and Audio Director for about 6 years, about half that time.

BD: What were you doing when you first started?

JM: When I very first started, I had moved to Vancouver for post production gigs or anything in media because there was always lots back then, movies, TV and games. Games were my number one choice so I really pounded the pavement to get into game studios because I had no experience. My first job was in QA, I did it for a little while and then I was a junior sound designer/audio artist on FIFA when I got grabbed by the guy who I kept bugging every day. I was given a shot to do that and then ended up owning the commentary side of things because the guy who was doing that left. I just got thrown into it and just started going through there and just worked my way up.

BD: How has the technology changed since you started doing it?

JM: Aw man, like crazy. I know it’s an old cliché but man it changes fast. And I’m not just talking about the consoles. The changes in the consoles are not the biggest changes, it’s more the behind the scene stuff. I’m moving offices right now and I was in that edit suite for 11 years so there’s all sorts of stuff in there from throughout the years. I’ve got all these DAT tapes with all this source on it, then an old DAT player and zip drives with SCSI cables – all this stuff for just getting the amount of data that audio needs. Moving it around from machine to machine used to be a real undertaking.

Now we just use a Fast Spec service, Dropbox or anything and boom, you got 10GB and it’s in a studio in London within an hour and it’s back in your hands in no time. There’s no physical media being used, there’s no SCSI cables, there’s no mismatch between Macs and PCs. Back then we were all using ProTools on Macs and just operating on a network alone you’d be grabbing files and destroying resource forks if you remember what those are from old Mac stuff. All of a sudden your files wouldn’t work properly because they just happen to touch a PC network in transit. Things like that were crazy and nowadays it’s just like, out of the box surround sound, HDMI cable, HDTV, wireless controllers, everything’s on a network, everything’s fast, everything’s big, good to go, the work environment is just sped up like crazy.

BD: I imagine it probably took a while to even have enough memory to put in the commentary system.

JM: Yeah, if you want to talk about the tech side on the consoles, disc footprint is a big one for audio and RAM is the same thing because the audio is played out of buffers for the most part. It’s not streamed directly from the media so the larger the buffer is, the less you have to stream, the less you’re hitting that resource and the faster your game will perform. So we’re able to have way more voices and audio out of RAM now. We still stream the commentary with reasonably sized buffers and lots of stuff we just load into RAM and go from there. We do asynchronous loading, there’s all sorts of technology that has gotten us to the point where can have a whole lot of stuff happening at the exact same time without sacrificing quality and resolution.

Memory we do hog up a lot and it’s a fight. Any audio guy on any game team has to fight. Anybody has to fight for resources but particularly audio does because it’s trying to come up against visuals and visuals are the holy grail everywhere, especially in games. Everyone wants best graphics. And, no word of a lie, someone is going to tell you it’s more important that there’s a bead of sweat on that player who is a hundred feet away than it is that you get an extra stream to play higher resolution crowds or something like that. So there’s a whole PR job of convincing game development teams that audio is important and why it’s important.

BD: Probably with FIFA too, the physics get a lot of attention.

JM: Yeah, you don’t mess with gameplay on FIFA. That’s what sells the boxes and everyone knows you don’t mess with gameplay. So when they need extra cycles from the CPU, they get it. They’re typically not too heavy on RAM but their CPU usage is up there so we have to be careful that we don’t spike. And as we (in audio) get more sophisticated with the logic side of things, our CPU usage is starting to really climb which is going to be a problem in the future.

COMMENTARY – RECORDING

BD: How has the process of recording the commentary changed over the years? Do you have a variety of methods as far as how scripted it is and how you coach them?

JM: When we started we were just like everybody else, kind of new at it, not doing it the smart way, just brute force. We would usually hire a script writer, tell them what we need, they’d write the script and we’d go and have our talent read the script like actors. It was okay, it wasn’t very good though particularly because sportscasters/commentators are not actors so it’s not their normal scenario. So what we did about seven years ago was we switched from written lines to adlibs. So we’ll describe a scenario really specifically then we’ll give them a couple example lines on paper that they’re not going to read, they’re just there to set the tone and the length so they can get an understanding of what we’re looking for. Then they adlib as many iterations as they can, we’ll give them a guide saying we’re looking for about 10 of these, we’re looking for about 15 variations of this, we want about 2 takes of this and let them go. If they give us extra, great, if they don’t get there and they’re running out we don’t force them to do 10, we just go with whatever they’ve got in their head.

There’s a whole bunch of benefits to doing this, namely, you get to capture their character. You’re not writing lines for Martin Tyler to sound like him. You actually have him so he can sound like himself and by letting him adlib you preserve his character and his wording. Right away it’s a more authentic and better performance. You don’t have to pay for a writer. Your example lines don’t have to be as good so you don’t have to spend as much time on those and you can spend more time doing other things. So it’s just a huge win across the board. And the talent really likes doing it, they’ve come to really prefer it as opposed to walking in with a phone book and just saying, “Read this.” They like the interaction and thinking about the sport, stuff that they’re experts about and are passionate about.

BD: Have you ever tried to have them watch actual footage and record it?

JM: Yeah, we have a whole bunch. We’ve had them do actual whole games of FIFA (the game) and comment to it. It’s not ideal because the pacing is too fast for them, it throws them off and you end up with a delivery to suit the game that may not be what they’re like in real life. At the same time, having them commentate to a full video game and then coming back and just using it as reference, researching it, watching it and listening to how they treated it is invaluable because it showed us sort of a template for what they’re going to do. In terms of useable content, it took way longer to get stuff and the performance wasn’t much better so it was just a diminishing return. There are cases where we do that though, especially things that require an immediate reaction like a penalty shot or something. If we want to get a bunch of penalty shot goals, saves and misses, we put together a big highlight reel that’s all random and they don’t know what’s coming up so they can say “Oh, they missed it!” or “Oh, it went in!” and we can get it a little more believable than just reading it off a piece of paper. I also put crowds into their headphones, just a low murmur of crowds.

BD: Just so they emote more?

JM: Yeah, because that’s what they’re used to. They can hear it when they’re at the match commentating and it helps – especially if they close their eyes or if they’re into it – to remove them from a studio in the center of London, to be where they need to be psychologically.

BD: How difficult is it to oversee localization, for instance, ensuring that Arabic commentary flows as well as the English?

JM: It’s really difficult, it’s really, really difficult. I mean, as evidenced by how many times have you played or watched anything translated to English that was done well? At the beginning when I started we did it all. I used to go to Europe and go to Italy, the UK, Germany, France, Spain, all these places and go and record their commentary. I didn’t speak the language, I’d hire an interpreter. And hopefully that interpreter was telling the truth about whether it was a good or bad performance.

BD: You’d have to really trust your contacts I guess (laughs).

JM: Yeah, it was weird because, you know how big soccer is around there. Half the time your interpreter would be star struck about being in a recording studio with this famous talent so he’d be too scared to tell you the truth. You’d really have to get the right interpreters.

BD: They’d have to tell them they’re not being exciting enough…

JM: Yeah, so that was tough but at least it was there. It was a bit brute force, it wasn’t efficient but we could get good stuff. Then we developed a localization studio in Madrid that handles all the localization.

They have some real fantastic engineers there but it’s hard to convince somebody of the value of spending money on testing and talent direction.

To get things done, a lot of the time you have to be able to say, “Doing this will result in selling this many copies of the game” and it’s really intangible. It’s really hard to say, “The users in Saudi Arabia bought more copies this year because Arabic was slightly better.”

BD: There’s no direct evidence.

JM: There’s no way to capture that data so accounting and stuff like that is really just based on return on investment. You just have to make your case, do the best you can, and what I do is just choose my battles. We have a priority of regions, what we call EFIGS, or now we call them MEFIGS – Mexico, England, Italy, Germany, Spain and France – those are the ones we care about. That’s where we sell 90% of our games. Those are the people who care the most so those are the people who get the highest standards. Starting with those, if they are all 80-90% of what English is, then we’re good. Then we look at Czech, Arabic, Dutch, Swedish, all that stuff after that and try to do it there.

COMMENTARY – IMPLEMENTATION

BD: Do you ever speak with the teams that work on Madden or NHL to compare ideas or how the commentary works between them?

JM: Yeah, specifically on commentary we now have a task group that goes across all EA Sports. All the speech people get together and we definitely all work together. We share best practices and technology to help each other out so I think what’s going to happen in the next few years is you’re going to see the quality of all EA Sports commentary rise up in a substantial way. I think there’ll be a tangible difference and in terms of audio in general this is the second year we’ve had this group called Audioworks. We don’t actually report to our game teams anymore, we report to a central department and everyone in audio is in the same central department. It allows us to move around, help out and share a little bit better so I think it’s going to bring the quality up – the whole, ‘greater than the sum of its parts’ thing.

BD: I imagine it’s quite difficult to have every individual line match with the preceding one and the following one, can you explain how you manage that?

JM: That’s probably the hardest part. It was easier in the past because even though it’s an interactive, non-linear thing, it was very linear, in essence. We designed, we knew what was going to happen and stuff would trigger. But, nowadays, there’s a whole speech commentary AI system that sits in between the game and the actual commentary. It’s doing its AI stuff, making decisions, calling on things and it’s unpredictable. It’s like, emergent behavior. It’s loosely based on AI networks, linguistics stuff and it is…just imagine a big, bubbling cauldron and whatever comes up is just as much a surprise to me as it is to you. And likewise, the next thing that comes up is, so the huge challenge is figuring out how to make things match in…intensity is the big one but also make sense logically so it’s not like, a disjointed person talking about one idea, then another idea, then another idea.

So what we do now is, we have these contexts that we just continue to add rules to. In this situation do this, in this situation don’t do this. And we try to limit the types of things that can happen in situations and try to control that flow without, sort of, hardwiring the system. So it now requires a whole lot of just playing. When you hear one of those bad situations you have to dig in and find out why that happened and where can you insert some sort of a rule that prevents that from happening. It’s getting better every year. I mean, it’s still bad, still when I play it I’m like, “Oh God,” but if I was to play a game from 2 years ago that “Ugh” moment would be way more times out of 100 than it is now. And the goal obviously is for there to be none but, you know, if we get there we won’t have jobs either (laughs).

BD: Are there layers of intensity then? I imagine you don’t want a really exciting moment to be followed by doing a deadpan sort of commentary.

JM: The commentary taps into a system that determines the level of intensity for the crowd and we often use that as a filter, just to say, “If you’re going to say this you can only say it if the crowd intensity is below a certain level.” So if the crowd’s pumped and you just said something exciting, the chances of you saying something unexciting following that are slim to none because the crowd’s still pumped. As the crowd starts to trail off the commentary gets the ability to tone it down and start talking about other things. It starts to open up the available dialogue. So we use this filter to sort of, as it gets more exciting we close it down to just the really exciting stuff and then we open it back up.

That said, there are times where it is really appropriate. One of the big things that I think about when thinking about commentary is music. You have to resolve a melody, especially with a build-up play in a sports game. If you just build up, build up and leave it, it’s like, “Okay, what’s happening next?” It doesn’t make sense to us. So there’s always got to be that resolution whether it’s a payoff, whether it’s disappointment, or it’s shifting gears to something else and so it’s the resolution that can sometimes be really unintense. “He takes the shot! Oh, and that was really wide.” There’s a situation where if the shot goes really wide, you don’t want him to still be excited, matching that intensity would actually be incorrect. So in those cases you want to circumvent the rules and filters you’ve put in place, make exceptions and stuff like that so it ends up being a sort of messy system of rules that are all very simple but when they begin to interact in the real world, in the wild, you just don’t know what’s going to happen.

If everything has tight enough parameters around it then it should play at the right time and if everything is playing at the right time, then everything should make sense. You might have to get in there and tune it a bit but that’s the ideal situation so rather than handcrafting, which is what we used to do up until 2006/2007, now we do the whole emergent behavior kind of methodology where it’s all based on context and it’s data driven.

BD: What sort of challenges do the audio programmers have to deal with?

JM: A lot. It’s hard because people don’t generally appreciate the complexity of audio and I’m not talking about making audio, I’m talking about implementing audio in a real time sports game that plays back in a way that’s accurate, compelling, entertaining and all that stuff. I think it’s easily as sophisticated as the gameplay engine is itself. Because every person on Earth knows when someone says something stupid, do you know what I mean?

BD: Yeah.

JM: It’s like, you can get away some glitches here and there in other things but if you say something that’s dead wrong – like, “That was a great save,” but the ball went in the net, then you’re just panned. It doesn’t matter what else you do, after that your audio gets a 0 out of 10. So you have to be right all the time. Then you have to look at, “Ok, so I’m an audio programmer, I have to give the commentary system triggers, parameters, cues, information hints.” For the most part it’s straight forward. What’s the score, where is the position on the field, what’s the velocity of that shot, there are thousands of things you can figure out. Then, there are a whole bunch of things that are completely subjective.

BD: The nuances of human language…

JM: Yeah, and more than that, analysis and perception. If you and I see a tackle go in on the screen – one guy tackles another guy – we immediately know how hard that tackle feels by looking at it. We go “Oooh!” if it was bad and they smash into each other. You could tell if it was a cynical, nasty tackle on one guy’s part or if they just bumped into each other by accident. Or if they hit hard but nobody really got hurt, they just kept going and it was no big deal. The code doesn’t know that. It knows things but it doesn’t know what a human would think about what that tackle looked like.

So while our FIFA game is very sophisticated in commentary there are certain areas if you were to go, pick it up right now after having told you this and try tackling all over the field, you’ll realize there’s nothing. It’s just, “Oh, there’s a challenge for the ball there,” because we can’t accurately say whether a tackle was hard or not so we just don’t say anything. And it’s a big shortcoming. But you have to try and convince – in this case gameplay – to spend 30 days of one of your engineer’s time to provide the analysis for a tackle so that we can talk about it better.

BD: So with the physics, in order to solve that problem you would have to have an interplay between it and the audio. Like when a player falls on another it would track, for instance, where a player’s arm goes and try to determine what is funny about that.

JM: Exactly, they updated the physics in the game with new collision stuff so now if you go in for a tackle it’s just completely unpredictable. It’s great, these bodies fly into each other and they sort of behave like real bodies, it’s getting better but they don’t have any info. We used to have canned animations so you knew if a guy flew and fell down on the ground. It’s bad, okay, so we can say, “Ooh!” In fact, tackle commentary used to be a lot better than it is now even though tackles are a lot better now.

Because it’s like a sandbox of physics now people have priorities and when you have to try and convince someone to spend their time on it you only get so much. You have to choose your battles and as more and more games become more and more physics-based you have to sort of ‘get in on the ground floor’ to use an expression, with the people building these features so that they build the stuff that you need then rather than trying to have someone go back in after the fact and add some hooks for you, which is next to impossible. They don’t want to introduce knock on bugs with the code that they know works. It’s very hard for you to convince someone it’s a priority and it’s very hard to have them put other features on the chopping block in order to support an audio feature.

CROWDS

BD: I also wanted to talk about crowds a little bit, have you done impulse recordings for the specific stadiums that you use?

JM: Yep, we sure have. We originally went around and shot responses for a bunch of different stadiums and you can buy some responses for some of the big stadiums as well, we have a library of a bunch. A bunch of the ones you’ll hear in the game are authentic and then we use those and mix it with others for all our other stadiums and try to get close enough. It’s really cool, I really like that stuff.

BD: In real life soccer games the crowd noise is pretty constant, I was wondering if that makes it hard to have dynamics in the actual game?

JM: Well, it’s easier for us because we can fake them, we can just make it have dynamics but the hard part is making it authentic. We have all this amazing, amazing content, we get all this great 5.1 source from broadcasters and stadium recordings with an array of 30+ mics on discrete tracks that we can remix and it’s awesome. Our guy, you could sit down in the middle of his room and he could show you stuff, he’ll put it on 5.1, blast it out of ProTools and you’re just like “Ho-ly shit.” Then you can just solo a track, remix it and all of a sudden you’re in the section with the band, it’s just amazing.

But our game, like any video game, it’s fast and exciting so if you just put it all in and make some nice behavior/logic and let it play itself it’ll just be buzzing the whole time – it’ll be pumped, exciting and sound great. But obviously, as you know, with any audio you just get fatigued if it’s pinned the whole time. You need to have your troughs and valleys if you’re going to have your peaks so how do you artificially create them in a game that’s so fast paced? There’s a whole lot of psychology around dipping parts of the crowd and suppressing things and bringing things in. You’re trying to tell a narrative with the crowd as well as the commentary and you’re trying to reinforce the emotions people are feeling so as long as you’re sticking to those principles you can cheat a little bit.

BD: How many layers are there to the crowds?

JM: The crowd has quite a few layers, principally you’re looking at – there’s a crowd bed, a basic murmur and it’s got a few levels of intensity. There are lots of big, huge, long loops for all different types of stadiums, big, small, different places around the world. Then there are reactions, your Ohhs, your Awws, your cheers, your goal reactions and reactions for stuff on the pitch. Then there’s what we call salt and pepper, which are individual callouts, horns, whistles, and just people calling out. And those are really good to make it sound like you’re in different places in the world. Take a nice, normal crowd bed, layer in certain types of horns, yells and stuff like that. Here it sounds like England, you get a band or something over here, if you do some whistles and other things it sounds like Greece.

BD: I was playing in Brazil yesterday, I noticed lots of drumming.

JM: Yeah, Brazil’s one of the places we got tons of really good content, authentic matches from Brazil. Some of the teams are really good. The chants and the songs they sing are a big part of it too which includes a lot of the drumming stuff as well so those are always constantly going. There are away fans and home fans. They’re separate, they’re panned and they react properly depending on what’s going on on the field. And then there’s a bunch of other layers of bits and pieces, some glue, there’s an anticipation thing and it all comes together to create what’s supposed to sound like a cohesive, coherent crowd. It’s really tons of elements and tons and tons of voices because almost everything is in quad, I think.

BD: I noticed that in FIFA World Cup 2010 it accurately depicted the use of vuvuzelas, how did you know ahead of time that they were going to be used?

JM: We didn’t know, we were just told through people who know these things. But we knew, and we were looking at the qualifiers of all the matches in South Africa and it was extremely annoying.

BD: (laughs) I was going to ask if it hurt you as a sound person to put it in every match.

JM: Well, I’m not sure if you’re aware but we were just like, “This is going to be the worst for some people.” So in the options we put in a volume slider just for the vuvuzelas. In a lot of the reviews a lot of people really appreciated it. Some people want the authentic experience. That’s what it sounded like when they watched the game, especially with the World Cup because it’s an event. More people buy that who don’t play FIFA because it’s just part of the experience of the World Cup. So they maybe want to play it with them on because that’s how they heard it on TV that same day or if they’re playing with their friends. Whereas if you’re someone who is going through a big long campaign and if you’re playing hundreds of matches it sounds like a swarm of bees and it really took away from any kind of dynamics because they’re constant. So we did add some dynamics, we did duck them when things happen but in reality we kept them very authentic and gave you the ability to turn them off.

BD: Yeah, that’s the best solution.

JM: Because you couldn’t not have it. If you didn’t have it you would be accused of not being authentic. “If it’s in the game, it’s in the game,” – it’s kind of our motto but if you just always had it then you piss off a lot of people in the goal of being authentic and we’re not just trying to be authentic, we’re trying to be entertaining, it’s a game. So, we give them the choice and that way they can’t complain.

It’s kind of the same reason why we give people the choice to put in their own chants. We recognize that we can’t get to everything, we recognize that you’re a hardcore supporter of your 3rd division club and we respect that. We’re probably never going to get to putting chants in for Scunthorpe United, you know what I mean? But you might get them and you might be able to put them in and share them with your friends. And the other side of that is that most of the chants aren’t safe for a PG rated game so there’s an option for you guys to put in whatever content you like that we can’t license, for whatever reasons. And the cool thing about user created content is, it’s a multi-pronged thing.

From a business side of things, user created content makes the game stickier – that’s a word they use. You’re going to be a lot more invested if you’ve invested time creating stuff, putting it in your game and creating your experience that’s unique. You’re going to be more inclined to continue with this game next time it comes out especially if you can carry that stuff forward. At the same time, you’re able to put stuff in that wasn’t there so that’s good. And the third benefit is people complain a lot less because if someone tries to complain, “Oh well, they don’t have this,” someone’s going to say, “Well, you can do it.” So you end up getting better reviews by opening up your toolset to people. You end up getting people more invested in the game and you get people putting stuff in that you could never pay to license. So it’s a super awesome thing in gaming philosophy. I’m all for giving the user all the tools for gameplay, audio, visuals, everything. Let people create and share their experiences and let it loose in the wild. That’s the awesome stuff in my opinion.

INTO THE FUTURE

BD: So I guess you feel that there’s no shortage of work to be done in audio for the next few years.

JM: Yeah, as the games get bigger and better, the job gets bigger and the ability to get better is there but the difficulty goes way up. It’s a lot harder to make a game sound at the same level of quality as it did on PS2 in terms of behavior. In this case I’m not talking about fidelity, obviously we can have crisp, awesome assets now so the games can sound super good. We can put in nice, high-res assets but I’m talking about the implementation, the behavior, the playback, that kind of stuff. It’s a lot harder, a lot more challenging and in fact, makes games now have a lot more bugs than they used to have.

BD: It’s interesting, you think that with technology, you kind of worry that our jobs may become obsolete but in actuality there’s more stuff to do.

JM: It’s totally true that the more advances there are, the more there is to do. Like you said, you do worry about technology making things obsolete or technology making things at least easy enough so that it can be outsourced to other countries, those are always concerns but hopefully that doesn’t happen soon.

BD: I attended a talk by Art Director Rick Stringfellow recently where he talked about the importance of context in EA sports games and it sounds like the new audio features in FIFA 13 such as real time updates, sideline reporters etc. reflect what they were trying to build towards in the next few years – which was creating more actual world situations to deliver more emotion, in part through audio.

JM: Yeah, we were part of a presentation, a sort of strike force advanced team with him and his group. We talked a lot about that and audio is ahead of the curve on all that stuff so I think they were looking to do a lot of that.

It’s funny to be in the audio department because you get to see a lot since the dependencies are so big. You need to see what’s going on in marketing because you’re giving them assets for gameplay videos, obviously presentation and all this stuff. You get to see how all these people are working on all these different things and they don’t get to talk to each other. They can but they’re busy so they don’t. So the audio guys have a very interesting and unique perspective on these games because they end up seeing the bigger picture more than most people on a project. So it’s kind of funny to watch it unfold when you can predict where it’s going and it gets there, I don’t know why I’m going there but it’s just one of those weird things.

BD: It’s probably because we’re at the end too so we see everything that precedes it, which is usually every other department.

JM: Exactly, yeah, I call that the leaves of the dependency tree. We are at the very end of every dependency so nobody depends on us for anything which is a double-edged sword. You can be ignored because nobody needs anything from you. To do their job you can’t break anyone else’s job but at the same time – and this is what’s really great – by being sort of a black box you’re kind of left alone and people aren’t poking around with their opinion of how it should be done. So you have the ability to do some really cool stuff and have nobody really bother you as long as you’re just on time. Which is great, I love the niche space that audio occupies because you get a lot of freedom.

BD: You like to be left alone to some extent (laughs).

JM: Yeah, it’s part of that PR job that I was talking about. The Audio Directors should be really on top of it so that they’re explaining to producers why you need audio, why you want audio, how it’s cheap and it provides a whole lot of emotional bang for the buck for the end user. Keeping people excited about it and understanding that it’s needed is key because then it keeps the money flowing, it keeps the staff numbers up and that’s the way you can do really, really cool stuff which is all we want to do.

Share:

4 Comments

First of all this is a really good insight on the audio process for Fifa,

After years of playing it and working with sound recording, i think there’s a key thing missing in Fifa, in the crowds you can ear Ohhs and Awws and more stuff as JM said, but here’s the crowd chanting and rooting for their home (or away) team? I’ve never heard it on Fifa, and i think that would be something pretty interesting to put in the game. It could give the game a bit of more emotion.

True but they provide the possibility to put your own audio files in the game so you can easily achieve that. There are dozens of chantings for each team and most of them are against the opposing team or offensive… i think that is why they let you decide otherwise the ESRB rating will rise up, and for a sport game isn’t a plus factor for sure