Analyze ARM926EJ-S optimization in MPEG-4 soft demoder and realize

MPEG-4 video compression standard asks oneself that receives people’s extensive concern generation. In recent years, the realization of MPEG-4 player has already become the research focuses of numerous producers correctly in embedded application. Importance and its canonial development course of video compression. With digitization, networked to integrate the arrival of information age in the whole world, multimedia information including sound, figure, data, picture, image is linked and sent and dealt with; Arrange technology after lying in in its key. Because MPEG-4 system is huge and needing a large amount of data processing, realize MPEG-4 to be soft to decode, need, carry on abundant optimization reach ideal characteristic to original algorithm their in ARM. It is a pure software implementation based on that MPEG-4 of ARM926EJ-S microprocessor decodes the algorithm of this research and method to optimize, through decode algorithmic software optimization correctly, show the speed, bring the intersection of QVGA and code stream, MPEG-4 of form, up to 37 f/s by original 10 f/s at the intersection of ARM9 and platform, totally reach the smooth requirement that broadcasts, have very high use value. At present, the technical application of the video is in a very large range, such as the online visual meeting, online visual E-business, online government affairs, online shopping, online school, tele-medicine, online seminar, showing business such as the drawing room, personal IRC, visual consultation on the net.

2 developing platforms and time consuming analysis

What the thesis is studied and used is the comprehensive developing platform based on ARM926EJ-S microprocessor, adopt Linux operating system, a kind of Unix computer operating system that Linux is is referred to as. The inner nuclear name of Linux operating system is too ” Linux” . Linux free software and open source code, wear example of the name most in the development too in operating system. Come, say Linux the the intersection of word and itself piece show the intersection of Linux and kernel, until people have used to it describing because of Linux kernel entirely with Linux in fact strictly, and use the operating systems of different tools and databases of GNU project. Linux gains the name and connects 320*240 QVGA form in the computer amateur Linus Torvalds. LCD display screen. The clock rate of ARM926EJ-S microprocessor is 190 MHz; Adopt the pipeline operation of integer of 5 grades, support the 32-bit microcomputer ARM instruction set and 16 Thumb instruction sets and DSP instruction sets expanded; Support data Cache and order Cache, have more high order and data-handling capacity.

The algorithm flowchart of MPEG-4 SP grade is shown as in Fig. 1. The previous work optimized should decode MPEG-4 the code to transplant to the developing platform first, then carry on calculation sum and time consuming analysis in decoding each module and find out the key content optimized. This literary grace uses AVI code stream with length of 376 934 B in order to test arrays, this code stream amounts to 95 frames, including 8 I frames, 87 P frames. The analysis result is shown in Table 1 when the ones that measure before optimizing dawdle, test entirely the array decodes and broadcasts time consuming 10.05 s, decoding has 9.5 f/s only to broadcast the speed.

ARM Advanced RISC Machines It is that microprocessor trade famous one for enterprise,it last a large amount of high performance, cheap, consume energy by low RISC processor, correlation technique and software. Technology has the characteristic of characteristic high, low cost and saving energy. Fields many kinds of to suitable for,it for example not imbed, control, not consume /multimedia, DSP and mobile applications educational,etc.. Improving and decoding the speed with the primary service of software implementation MPEG-4 demoder on ARM, reach the result of broadcast of the ideal picture at the same time.

3 MPEG-4 decodes the optimization on ARM926EJ-S of algorithm

MPEG-4 is soft to decode and regard XVID source code increasing income as consulting, transplants C source code of XVID to ARM platform, go on, optimize and the intersection of test and decoding optimize, broadcast the characteristic on this basis. Optimize and mainly go on from 3 respects:

1 To the software organization of XVID source code, the flow of program carries on the adjustment suitable for ARM.

2 Write and collect the function to replace C program module to the intersection of calculation sum and large time consuming more module, raise the efficiency of program execution.

3 Look for the fast or parallel algorithm

Structural optimization of 3.1 software

Resource of ARM very limited, should try hard to reduce memory reference at structure in the software arrange, increase the hit rate of Cache, raise the efficiency of program execution.

3.1.1 The appropriate module is amalgamated and dealt with in order to reduce the hit of the memorizer

In the source code before optimizing, the macroblocks of I frame and P frame decoded software organization and is shown as in Fig. 2. In this procedure, as to inter macroblock, variable-length decoding VLD ,Scan Iscan instead ,Dequantization Iquant Have the intersection of Block and storage area of 3 read in the three course, 2 the intersection of Block and storage area write and 1 the intersection of Data and storage area write. What the source code has referred to not compiling, according to certain normal text file that wrote of programming language. Source code Also call the source routine ,Refer to a series of human readable computer dverbal instruction. In the modern program language, the source code can appear in the form of book or tape, but the commonly most used form is the text file, the purpose of such typical form is for compiling out the computer program. The final purpose of the source code of the computer is to translate the human readable text to become the binary scale order that the computer can carry out, this kind of course is named and compiles, finish through the compiler.

VLD will carry on scanning instead and dequantization at once after reading data processing from Block buffer after amalgamating, and store the data after dequantization in Block. The whole course has only carried on the REWRing of a Block buffer, has not merely reduced two read/write operation, have also reduced the opening up of a Data buffer. Meanwhile, carry on the treatment that dequantization still saved a large amount of null value after VLD immediately as to P frame, this too considers one of the combined key factors.

Likewise, I AC/DC in the frame predict and dequantization carry out amalgamation. The method is: Add_acdc pMB, i,