Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Transcription

1 Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy Patrick Brown EE382C Embedded Software Systems May 10, 2000 $EVWUDFW MPEG Audio Layer-3 is a standard for the compression of high-quality digital audio. It has rapidly become very popular and has found widespread application in devices such as portable, all-electronic music players and in Internet audio. Currently, many algorithms for Layer-3 audio encoding are used. Their performances vary greatly, but the average encoding rate is approximately one second of digital audio encoded per second on a typical Windows-based desktop computer. While this rate is acceptable to most personal computer users, some applications demand a much higher encoding rate. For example, a typical radio station has thousands of CDs, each containing up to 74 minutes of digital audio. If the radio station decides to convert all of its audio to MPEG Layer-3, 74 minutes to encode each CD is unacceptable. The goal of this project is a formal modeling of an MP3 encoder, which will expose the parallelism in the encoding algorithm, facilitating the scaling of the algorithm to a multi-processor implementation. Much greater throughput is achievable by scaling the algorithm to multiple processors.

2 &RQWH[W MPEG (Moving Picture Experts Group) Audio Layer-3 [1], more commonly known as MP3, is part of the set of standards known as MPEG-1, which was approved by the International Organization for Standardization (ISO) in November 1992 [2]. The primary focus of this standard is the compression of high-quality, synchronized audio and video to a data rate of approximately 1.5 Mbps [3]. This standard consists of three main parts: system, video, and audio. Within the audio portion of the standard, there are three layers. Layer-3 provides the highest compression at a given sound quality. Table 1 [4] shows some of the common compression ratios available using MP3 compression based on the relative quality of the resulting audio. Sound Quality Bandwidth Mode Bit-Rate Compression Ratio Telephone 2.5 khz mono 8 kbps 96:1 AM Radio 7.5 khz mono 32 kbps 24:1 FM Radio 11 khz stereo kbps 26:1-24:1 Near CD 15 khz stereo 96 kbps 16:1 CD Quality Over 15 khz stereo 112+ kbps Up to 12:1 Table 1: MP3 Compression ratios for various output sound qualities. In MP3, as with other source coding standards, the decoder is rigidly defined, whereas great flexibility exists in the design of the encoder. Many freeware, shareware, and commercial encoders exist, some of which are open source. The formal model will be based on the L.A.M.E. encoder [5], because it is open source, freely distributable, efficient, and produces good sound quality. As with most signal processing applications, the encoding algorithm contains a

3 large amount of parallelism. A major benefit of building the formal model is the exposure of this parallelism, because this is what will make the algorithm scalable to multiple processors. 2EMHFWLYHV The formal model of the MPEG Layer-3 encoder will be a dataflow graph consisting of various blocks, or actors, each containing a portion of the L.A.M.E. source code. The actors will exist within the SDF (Synchronous Data Flow) model of computation. In this model, or domain, each actor has a fixed number of input and output ports. Each of these ports receives or sends a fixed number of tokens of data, such as a single integer or floating-point number, or a matrix of values. There is no notion of time within this domain. This domain is wellsuited to the modeling of an MP3 encoder, because input audio is processed sequentially, 576 samples at a time, as quickly as possible, with no consideration of time or other factors that may be present in other domains. In addition to exposing the inherent parallelism, the formal modeling also allows retargeting of the algorithm to different implementations. Therefore, it will be possible to apply the algorithm, which was originally a C program written to run on a single general-purpose processor, to a wide variety of platforms, such as a multiple processor workstation or custom hardware containing multiple DSPs. Converting the C code to C++ and importing it into SDF actors introduces additional processing overhead, because the domain must provide a means for communication between actors. In the original algorithm, this was simply done

4 by function calls. Scheduling the algorithm on multiple processors also introduces overhead. This is due to the fact that processors must spend time sending and receiving data. However, the nature of the algorithm is such that the amount of inter-actor and inter-processor communication is very small. This is the main reason that a formal model of the encoder is beneficial.,psohphqwdwlrq DQG 0RGHOLQJ PCM Audio Input Time to Frequency Mapping Filter Bank Noise Allocation, Quantizer, and Coding Bit Stream Formatting Encoded Bit Stream Psychoacoustic Model Figure 1: Block diagram representation of an MP3 encoder. The formal model was constructed using Ptolemy [6]. Layer-3 is a very complex encoding scheme; the actual ISO standard [2] is nearly 200 pages long, and the L.A.M.E. encoder source is approximately 20,000 lines in length. Unfortunately, it was not possible to implement a fully functional encoder due to this complexity and the timing constraints of the project. Figure 2 shows a screen shot of the implemented model.

5 Figure 2: The model The model includes the most important filtering actors. The model does not include the final stages of the algorithm, which involve noise allocation and bit stream formatting to produce the actual MP3 output file. These portions of the algorithm do not involve a large amount of parallel data and, therefore, would benefit least from formal modeling. A truly complete model of an encoder would include these blocks at the far right of the graph, in place of the two red blocks in Figure 2. These red blocks are currently actors that plot the output of the filters in the frequency domain.

6 The top and bottom halves of the graph represent the two channels (left and right) of audio. The actors on the far left are source actors, included for the purpose of simulation. They generate an input signal composed of three sine waves at different frequencies with two Gaussian random noise sources added. This input provides a wide range of frequency content, similar to what CD-quality music might contain. Immediately to the right of the source actors is the polyphase filterbank. Barely visible in Figure 2 are the delays (shown in green) on each arc before each of the 18 filters in the filterbank. Each of these delays is a different amount, so that the input to each filter is the same, but phase shifted by 32 samples. As described mathematically in [8], each of the 18 filters produces one sample in each of the 32 frequency bands. The large block following the polyphase filter rearranges the samples into 32 separate frequency bands, each consisting of 18 samples. Each band is then operated on by a MDCT (Modified Discrete Cosine Transform) actor. This actor requires additional input data from the psychoacoustic model. Both the polyphase filterbank and the MDCT operations are lossy, but the quality lost due to the MDCT is insignificant when compared to the polyphase filterbank [9]. However, the MDCT introduces aliasing, which is the result of the overlapping of the frequency subbands. The next column of actors performs the aliasing reduction butterfly, as described in [7]. The butterfly actor also requires data from the psychoacoustic model.

7 In looking at Figure 2, it is apparent that the model does, in fact, expose a large amount of parallelism. All 18 of the polyphase filters can operate on the input data simultaneously. The rest of the actors can operate on the frequency subbands independently of each other. Therefore, there is a tremendous potential for scheduling onto multiple processors. One of the fundamental laws in parallel computing is known as Amdahl s Law [10], and it helps to illustrate why this algorithm is a good candidate for parallel execution. Let α represent the fraction of the algorithm which can be executed in parallel. Let P be the total number of processors. S P is then the gain in performance, or speed up, due to using P processors, and is given by the following formula: S P = 1 α/p + (1 - α) Figure 3: Amdahl s Law

8 Figure 3 shows the speed-up, S, plotted versus α. The curve shifts up and to the left as more processors are added. The important aspect of the graph is that α must be large to obtain a significant improvement by increasing P. This is because applications with little parallelism must waste large amounts of time communicating between processors, effectively canceling the benefits of using the additional processors. As demonstrated by this project, the MP3 encoding algorithm has a very high amount of parallelism, α. )XUWKHU :RUN The model created for this project comprises a large portion of a fully functional MP3 encoder. Future work on this model should include the addition of the final stages of the encoder: noise allocation and bit stream formatting. This would allow the simulation to produce an actual MPEG Layer-3 output file. Also, multiprocessor scheduling techniques should be investigated Ptolemy has the capability to schedule SDF graphs onto multiple processors. To do this effectively, the user must allocate enough delay on the appropriate arcs in order to maximize the utilization of each processor. Further research in this area could determine the optimum combinations of buffer sizes and number of processors, and possibly even further division of the blocks into smaller actors. Finally, the L.A.M.E. source code includes several NSP (native signal processing) kernels for performing some of the most processor-intensive math routines. The encoder could be benchmarked on various architectures, using these kernels, to determine the merit of employing NSP in MP3 encoding.

INTERNATIONAL STANDARD This is a preview - click here to buy the full publication ISO/IEC 23003-3 First edition 2012-04-01 Information technology MPEG audio technologies Part 3: Unified speech and audio

26 IEEE 24th Convention of Electrical and Electronics Engineers in Israel Packet Loss Concealment for Audio Streaming based on the GAPES and MAPES Algorithms Hadas Ofir and David Malah Department of Electrical

ComputerFixed.co.uk Page: 1 Email: info@computerfixed.co.uk UNDERSTANDING MUSIC & VIDEO FORMATS Are you confused with all the different music and video formats available? Do you know the difference between

Experiment 3 Getting Start with Simulink Objectives : By the end of this experiment, the student should be able to: 1. Build and simulate simple system model using Simulink 2. Use Simulink test and measurement

15 Data Compression Data compression implies sending or storing a smaller number of bits. Although many methods are used for this purpose, in general these methods can be divided into two broad categories:

Memory Access and Computational Behavior of MP3 Encoding by Michael Lance Karm, B.S.E. Report Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment

[exploratory DSP] Manuel Richey and Hossein Saiedian Compressed Two s Complement Data s Provide Greater Dynamic Range and Improved Noise Performance In this article, we present and analyze a new family

MSEE Curriculum All MSEE students are required to take the following two core courses: 3531-571 Linear systems 3531-507 Probability and Random Processes The course requirements for students majoring in

Chapter 7 Multimedia Networking Principles Classify multimedia applications Identify the network services and the requirements the apps need Making the best of best effort service Mechanisms for providing

CS559 Lecture 9 JPEG, Raster Algorithms These are course notes (not used as slides) Written by Mike Gleicher, Sept. 2005 With some slides adapted from the notes of Stephen Chenney Lossy Coding 2 Suppose

COS 116 The Computational Universe Laboratory 4: Digital Sound and Music In this lab you will learn about digital representations of sound and music, especially focusing on the role played by frequency

Ch 4: Multimedia Recent advances in technology have changed our use of audio and video. In the past, we listened to an audio broadcast through a radio and watched a video program broadcast through a TV.

ECE 790 Master s Research Prof Yu-Hen Hu Acceleration of MP3 encoder by using the GPU by Hsin-Yu Chen Abstract At the present time, CUDA is a useful and common computing engine used by many people It accelerates

Recommendation ITU-R BS.1196-4 (02/2015) Audio coding for digital broadcasting BS Series Broadcasting service (sound) ii Rec. ITU-R BS.1196-4 Foreword The role of the Radiocommunication Sector is to ensure

Optimization of Vertical and Horizontal Beamforming Kernels on the PowerPC G4 Processor with AltiVec Technology EE382C: Embedded Software Systems Final Report David Brunke Young Cho Applied Research Laboratories:

Module 7 VIDEO CODING AND MOTION ESTIMATION Lesson 20 Basic Building Blocks & Temporal Redundancy Instructional Objectives At the end of this lesson, the students should be able to: 1. Name at least five

VideoCD Audio + Stills A solution compatible with DVD players 1. INTRODUCTION This manual is a translation into English from the original Spanish document available in www.videoedicion.org and www.vcdsp.com,

AAMS Auto Audio Mastering System V3 Manual As a musician or technician working on music sound material, you need the best sound possible when releasing material to the public. How do you know when audio

1. Before adjusting sound quality Functions available when the optional 5.1 ch decoder/av matrix unit is connected The following table shows the finer audio adjustments that can be performed when the optional

CHAPTER 5 AUDIO WATERMARKING SCHEME INHERENTLY ROBUST TO MP3 COMPRESSION In chapter 4, SVD based watermarking schemes are proposed which met the requirement of imperceptibility, having high payload and