Abstract

This chapter reviews game audio from a Quality of Experience point of view. It describes cross-modal interaction of auditory and visual stimuli, re-introduces the concept of plausibility, and discusses issues of interactivity and attention as the basis for the qualitative, high-level salience model being suggested here. The model is substantiated by experimental results indicating that interaction or task located in the audio domain clearly influences the perceived audio quality. Cross-modal influence, with interaction or task located in a different (for example, visual) domain, is possible, but is significantly harder to predict and evaluate.

Introduction

Perceived quality in game audio is not a question of audio quality alone. As audio is usually only a part in an overall game concept consisting of graphics, physics, artificial intelligence, user input, feedback and so forth, audio has been considered to play a relatively minor role in the overall experience that a game provides. Consequently, a lot of effort has been put into providing near photo-realistic representations of (virtual) game scenarios to the player, but only little into audio. Interestingly, this assessment has had to be revised over the last years. Learning from other artistic fields like cinema, in which storytelling is a central means of providing “user experience”, game developers have come to know that audio can trigger emotions and provide additional information otherwise hard to convey. Today, although budgets are still limited compared to other aspects of game engineering, audio in games is given more attention by the game developers than ever before.

But there is more to audio in games than just an emotional support for a story. Most games are user-centered and non-linear, as opposed to the linear story telling of traditional, non-interactive content presentation. Therefore, the audio has to be manipulated in real-time depending on the player's actions. Real-time processing of audio can become computationally very demanding and is a problem for complex game scenarios. This has introduced the concept of plausibility: the main goal in game audio is not to have an audio simulation as exact and close to reality as possible, but to render audio that is plausible in the game scenario, and that provides an overall quality impression that matches the other aspects of the game.

One fact well known from home cinema applications is that an improved quality in video can also increase the subjectively perceived audio quality, and that the reverse effect also exists (Beerends & De Caluwe, 1999). It is therefore a most interesting question to see whether these effects can be exploited to increase the subjectively perceived overall quality of a game without actually increasing the computational load. Instead of just rendering more details (equivalent to a higher simulation depth), focusing on those details that are actually relevant in a certain context could provide a much higher Quality of Experience (QoE) (see Farnell, 2011 for a discussion of relevancy and redundancy in procedural audio design).

The central question is, therefore, which stimuli in a game scenario are of most importance? Can information that is difficult and cost-intensive to convey in one modality be presented in another modality with less effort but similar perceptual impact? What role does interactivity play in the perception of quality? What are the technical parameters that can influence the perceived quality of a game, and which other factors exist that potentially dominate the perceptual process?

This chapter aims at identifying and discussing general quality criteria in multimedia application systems with a focus on games. These criteria contain technical as well as human factors. In order to understand these factors, the first section touches upon the mechanisms of human perception: well-known facts about visual and auditory perception are summarized briefly.

The second section presents a discussion of cross-modal influences, that is, interaction between auditory and visual stimuli in the perceptual apparatus, and cross-modality in general. A survey detailing the most accepted theories of how audio-visual (bimodal) perception is achieved in the human brain is given. This is far more complex than just adding the results of auditory and visual processing and is therefore worth an extended discussion. This is followed by examples of effects in bimodal perception (based on research in the fields of psychology and cognitive sciences) that can be relevant in the context of game audio.

The third section discusses the concept of auditory and audio-visual plausibility. It briefly compares the requirements for exact (room) acoustic simulations versus real-time rendering and details the constraints resulting for computer games.

The next section gives an overview on issues related to interactivity, such as latency, user input, and perceptual feedback. Interactivity is closely related to the generation of presence, defined as the “perceptual illusion of non-mediation”, or simply the feeling of “being there”. The concept of presence is discussed as an indirect measure for perceived quality.