AES Technical Committee

Coding of Audio Signals

This committee has as its scope the theory, study, research, practice, engineering, and other standing issues involving the coding (i.e. bit rate reduction) of audio signals, especially but not limited to methods that use knowledge of human perception in the coding process.

The goal of this committee is to further work, knowledge and interest in subjects within its scope, and to increase awareness of the methods, problems, and standards within the AES. Within its scope, the committee will provide an informal technical forum for discussion of scientific and engineering problems as well as non-technical matters, participate in convention programs at the invitation of Convention Chairs, and work with the editor of the JAES, allowing it to distribute accurate and timely information to the membership.

Audio Coding CD-ROM

A CD-ROM on audio coding artifacts was prepared by the members of the AES Technical Committee on Coding of Audio Signals. This is the first educational/tutorial CD-ROM presented by the AES Technical Council on a particular topic, combining background information with specific audio examples. To facilitate the use of high-quality home playback equipment for the reproduction of the audio excerpts, the disk can also be played back on all standard audio CD players. Available via AES publications!

Emerging Technology Trends

Audio coding has emerged as a critical technology in numerous audio applications. In particular, it is a key component of mobile multimedia applications in the consumer market. Examples include wireless audio broadcast, Internet radio and streaming music, music download, storage and playback, mobile audio recording and Internet-based teleconferencing. Example platforms include digital audio broadcast radio receivers, portable music players, mobile phones and personal computers. From this, a variety of implications and trends can be discerned:

• Digital distribution of content is offered to the consumer in many formats with varying quality / bitrate trade-off, depending on application context. This ranges from very compact formats (e.g. MPEG HE-AACv2 and MPEG USAC) for wireless mobile distribution to perceptually transparent, scalable-to-lossless and lossless formats for regular IP-based distribution (e.g. MPEG AAC, HD-AAC and ALS).

• The frontiers of compression have been pushed further, allowing carriage of full-bandwidth signals at very low bit rates to the point where recent coding systems are considered appropriate for some broadcasting applications, particularly relatively expensive wireless communication channels such as satellite or cellular channels. While such technology predominantly makes use of parametric approaches (at least in part) to achieve highest possible quality at lowest bitrates, they are typically not designed to deliver “transparent” audio quality (i.e. that original and encoded/decoded audio signal cannot be perceptually distinguished even under most rigorous circumstances). Nevertheless, “entertainment quality” services over wireless channels have been very successful. Examples of audio coding that facilitates these new markets include MPEG HE-AACv2 and MPEG USAC.

• Transform-based audio coding schemes have been exploited to their full potential (quality v.s bitrate). As such, new paradigms will be exploited to gain further compression efficiency.

• There is a consistent trend toward hybrid coding techniques that employ parametric modeling to represent aspects of a signal, where the parametric coding techniques typically are motivated by aspects of human perception. The core of most successful audio coders is still largely based on a classic filterbank based coding paradigm, in which the quantization noise is shaped in the time/frequency domain to exploit (primarily) simultaneous masking in the human auditory system. However, the recent success of parametric extensions to the core audio codec, in both market deployment and standardization, illustrates this tendency:

o Audio bandwidth extension technology substitutes the explicit transmission of the signal’s high-frequency part (e.g. by sending quantized spectral coefficients) by a parametric synthesis of high-frequency spectrum at the decoder side based on the transmitted low frequency part and some parametric side information that captures the most relevant aspects of the original high frequency spectrum. This exploits the lower perceptual acuity in the high-frequency region of the human auditory system. An example is MPEG HE-AAC.

o Parametric stereo techniques enable rendering of several output channels at very low bitrates. Instead of a full transmission of all channel signals, the stereo / multi-channel sound image is re-synthesized at the decoder side based on a transmitted downmix signal and parametric side information that describes the perceptual properties (cues) of the original stereo / multi-channel sound scene. Examples are MPEG Parametric Stereo (for coding of two channels) and MPEG Surround (for full surround representation).

o Parametric coding of audio object signals provides, similarly to parametric coding of multi-channel audio, a very compact representation of a scene consisting of several audio objects (e.g. music instruments, talkers etc.). Rather than transmitting discrete object signals, the (downmixed) scene is transmitted, plus parametric side information describing the properties of the individual objects. At the decoder side, the scene can be modified by the user according to his/her preference, e.g. the level of a particular object can be attenuated or boosted. A recent example for such a technology is MPEG Spatial Audio Object Coding (SAOC).

• Audio coding has successfully entered the world of telecommunication, providing low-delay high-quality codecs that enable natural sound for teleconferencing and video-conferencing. Such codecs deliver full bandwidth and high quality, not only for speech material but also for any type of music and environmental sound, enabling applications such as tele-teaching for music. They support spatial reproduction of sound (stereo or even surround), which can greatly increase the ease of communication in conferences between several partners.

• For broadcast-only applications where delay is not a constraint, there is the possibility to gain further compression efficiency by exploiting large algorithmic delays or even multi-pass algorithms in the case of “off-line” audio coding.

• There has been significant progress in the challenge of developing a truly universal coder which can delivers state of the art performance for all kinds of input signals, including music and speech, has been achieved. Hybrid coders, such as MPEG USAC (Unified Speech and Audio Coding), have a structure combining elements from the speech and the audio coding architectures and, over a wide range of bitrates, perform better than coders designed for only speech or only audio.

• Also, the role of higher-level psychoacoustics and perception is becoming increasingly important in audio coding. Detection of auditory objects in an audio stream, separation into auditory (as opposed to acoustic) objects, and storage and manipulation as auditory objects is beginning to play a role. This will be an important and ongoing area of research.

• Solid-state and hard drive-based storage for audio has become extremely inexpensive and consumer Internet connection speeds reach into the Mb/s range. When such resources are available, music streaming, download and storage applications no longer require state of the art audio compression. Instead, what is occurring in the marketplace is that consumers are operating well-known perceptual coders at higher bit rates (lower compression) to achieve “perceptually transparent” compression of music, since the additional increment in resources required for such operating points is relatively inexpensive. For example, consumers are opting to use MPEG Layer III (MP3) or MPEG AAC at rates of 256 kb/s or higher to code their music libraries for their portable music players.

• Processor speed has continued to increase at a tremendous pace. Even with the low-power restrictions imposed by battery powered portable devices, the quantity of CPU cycles potentially available for audio processing is large. Present audio coders work in a fraction of available CPU capacity, even for multichannel coding, and new research may be needed to discover how to use the additional CPU cycles and memory space. Some possibilities are improved psychoacoustic models and sophisticated acoustic scene analysis. Seen overall, the research in audio coding is moving to the extremes, both toward lowest bit rates (very lossy compression using parametric coding extensions) and highest bitrates (noiseless/lossless coding for high resolution audio at high sampling rates/resolutions), as well as the more complex high-level processing (scene analysis and sound field synthesis of various sorts).

• There is considerable research activity exploring audio presentation that is more immersive than the pervasive consumer 5.1 channel audio systems. One might apply the label of "3D Audio" to such explorations, since the common thread is the use of many loudspeakers positioned around, above and below the listener.

Meeting Report:

These documents do not necessarily express
the official position of the AES on the issues discussed at
these meetings, and only represent the views of committee
members participating in the discussion. Any unauthorized use of
these publications is prohibited. Authorization must be obtained
from the Executive Director of the AES: Email, Tel:
+1 212 661 8528, Address: 551 Fifth Ave., Suite 1225, New
York, New York 10176, USA.