Simulated Annealing for Fast Motion Estimation Algorithm in H.264/AVC

Zhiru Shi, W.A.C. Fernando and A. Kondoz

[1] I-Lab, CVSSP, University of Surrey, Guildford, United Kingdom

1. Introduction

The promising video coding standard, H.264/AVC [1], is developed by the Join Video Team of ITU-TVideo Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG). By utilizing several new techniques, such as advanced intra predictions, variable block size ME, integer transformation, in-loop deblocking filter, H.264/AVC has achieved significant compression gain compared with previous video coding standards. It is now widely applied to many types of visual services, for example Digital Multimedia Broadcasting, Mobile Phone, and High Definition (HD) video delivery. In the near future, holography video and Super-HD video are expected to hit consumer market. These kinds of large sized video contents require higher coding efficiency while keeping the encoder complexity within an acceptable level. Therefore, new techniques are needed to reduce the computational complexity so that various real time video encoder and delivery services for the large sized video contents could be feasible.

In particular, Block-Matching Motion Estimation (BMME) with Full Search (FS) algorithm [2] is the main computational burden in H.264/AVC due to exhaustively search all possible blocks within the search window using Lagrangian multiplier. Although FS algorithm can obtain the optimum motion vector (MV) in most cases, it consumes more than 80% of the total computational complexity. Thus, a fast and efficient motion estimation algorithm is required for H.264/AVC. Recently, two major approaches were researched to overcome this problem. One employs fast mode decision algorithms to skip unnecessary block modes in variable block checking process [3, 4]. The other one utilizes Fast Motion Estimation (FME) searching algorithms to reduce unnecessary search points [5-11].

Various algorithms have been proposed to reduce search points for FME Search algorithm. Motion adaptive search (MAS) [5] utilized the motion activity information to adjust the search strategy. In Variable Step Search (VSS) algorithm [6], motion search range is determined by using the degree of correlation between neighbouring motion vectors. A Multi-Path Search (MPS) algorithm [7] has been proposed, in which all the eight neighbours around the origin of the search window were performed to find candidate points. This algorithm has good rate-distortion performance, but its computational complexity reduction is limited. To tackle this drawback, the directional gradient descent search (DGDS) algorithm [8] is developed. It searched on the error surface in eight directions by using directional gradient descent. The search patterns in each stage depend on the minima found in eight directions and thus the global minimum can be traced more efficiently.

The hybrid multi-hexagon-grid search (UMHexagonS) algorithm [9] was adopted in H.264/AVC reference software JM as its significant reduce the computational complexity with only little degradation in rate-distortion performance. UMHexagonS takes advantage of four kinds MV predictions to decided initial search point, i.e. the Median Prediction (MP), the Uplayer Prediction (UP), the Corresponding-block Prediction (CP) and the Neighbouring Reference-picture Prediction (NRP). After selecting the best initial point, it employs the unsymmetrical-cross search pattern and uneven-hexagon-grid search pattern, which are shown in Figure 1 as step2 and setp3-2. In these uneven search patterns, the number of horizontal search points is more than that of vertical points. This is mainly based on a common assumption, that the movement in the horizontal direction is higher than that in vertical direction. However, motion characteristic in each video sequence is unique. Also, the characteristic may change with the time. Therefore, with this horizontal-heavy pattern, UMHexagonS would lose accuracy and waste searching power.

Figure 1.

Search process of UMHexagonS algorithm

Predictive Intensive Direction searching (PIDS) algorithm was proposed in [10] to solve the problem caused by uneven search patterns by using a adaptive searching pattern. In PIDS algorithm, the correlation between predicted motion vector and optimal motion vector was investigated. The study revealed that the probability of predicted and optimum motion vector existing in the same directional region is at least 75%. Based on this statement, PIDS algorithm exploited the predicted MV direction to decide the intensive search direction. Thus, the intensive-direction search and coarse-direction search are selected adaptively for different regions. One example of search process of PIDS is depicted in Figure 2. As the uneven search pattern is changing according to the predicted motion vector of each block, it performs more precisely than UMHexagonS and achieves more computation reduction.

Figure 2.

Search pattern of PIDS, example of intensive search in d1

However, the PIDS algorithm’s adaptive intensive search selection is limited in directional regions. With fixed number of search points in each direction, it cannot adjust the search range for different motion scenes. In study [11], a statistic analysis of MV distribution was carried out. A large number of global minima occupy near the search centre especially at the zero MV (0, 0) with a certain percentage of optimal MVs outside the radius of 10 pels. It indicated that most predicted and optimal MVs have high locality correlation. Meanwhile, some irregular MVs can hardly be well predicted due to poor correlation. In this chapter, direction and distance correlation between predicted MV and optimal MV are investigated MV correlation statistics information is calculated for each frame as its motion characteristic. With this information, the intensive and coarse search regions are adaptively changed for each block. The Simulated Annealing concept [12, 13] is employed to control searching process and to adaptively choose the intensive search region. After this Introduction the chapter is organized into five more sections as follows.

Section 2 statistical analyse MV direction and distance correlation characteristic. The block-matching motion estimation is described in this section. Section 3 gives an overview of simulated annealing and simulated quenching algorithm. Based upon analyses, the proposed SAAS algorithm is presented in section 4. The experimental results are given and illustrated in section 5. Finally, section 6 draws the final conclusion.

2. Statistical analysis of MV correlation characteristic

Because of the consistency of object and the consistency of motion, MVs have high correlation in both spatial and temporal domains. Thus MV prediction technique is adopted in H.264/AVC to improve ME efficiency. A predicted motion vector, mvpred, is generated by previously coded neighbourhood motion vectors and MVD, the difference between the current vector and the predicted vector, is encoded and transmitted.

For the regions with smooth motion of a moving background and uniform motion of rigid objects, there normally exist very high correlations between predicted and optimal motion vectors. So that the BMME search algorithm only need to check a few points to obtain optimal position. While for the poor motion vector correlation scenario, like the complex and irregular motion, more candidate points are needed to be checked. Therefore, MV correlation characteristic will affect the searching strategy chosen in the BMME algorithm. In order to adaptively select an appropriate search pattern, MV correlation is statistical analysed in this section.

Figure 3.

Direction and distance classification of MV correlation p(MVCdi,gj)

To sufficient describe MV correlation characteristic, the MV correlation statistics are calculated in two aspects: motion vector directional correlation statistic and motion vector distance correlation statistic. Combining these two correlations together, the search window is divided into 8 direction regions di and a group of octagon gridsgj, as illustrated in Figure 3. Normally, the motion content of each video sequence is unique and the scene is changing with time. MVs correlation characteristic is changing with sequences and time. In this case, the analysis of MV correlation is frame based to improve the accuracy. The predicted and optimal motion vectors of previous frame are utilized for current coding frame.

where, dMVD∈[−180∘,180∘]is classified into 8 classes, with boundary of±22.5°, ±67.5°, ±112.5°and ±157.5°as illustrated in Figure 3. The statistical calculation is carried out by exploding the MV directional distribution in these classes. Then the distribution probabilities p(MVCdi) of MV directional correlation are obtained. If MV directional correlation is high, dbestlocates in forward or backward ofdpred, as shown in Figure 4. Class d1∈(−22.5°,22.5°)and Class d2∈(−180°,−157.5°)∪(157.5°,180°)indicate the forward and backward direction and normally have higher probabilities than other classes.

Figure 4.

Two situation of dbestlocating along dpred

MV distance correlation is measured by the distance between global minimum point and searching centre, which is known as the motion vector difference (MVD). Several Octagon gridsgjare utilized to categorise the MV distance correlation, as such circle-approximated pattern is more accurate to describe the MVD distribution. The interval between neighbour octagon grids is 4 pels, which is shown in figure 3. To evaluate the characteristic of MV distance correlation, MV distance correlation probabilities p(MVCgj) are calculated. Considering both directional and distance, MV correlation probabilities p(MVCdi,gj)for the current coding frame are defined as:

where di∈(1,8) is the directional classes and gj∈(1,sr/4) is the octagon grids within search rangesr.The MV correlation p(MVCdi,gj) represents the possibility of optimal MV obtained in the class(di,gj).

Table 1.

One example is given in Table 1, which shows the MV correlation characteristic in the 10th frame of “coastguard” CIF video sequence. For better understanding, the 10th frame of the “coastguard” is given in Figure 5. It can be observed that the fast moving boats bring some fast and irregular motion, while camera panning generates smooth movement on background. According to Table 1, more than 35% of optimal MVs are detected in the directional classd2. In classd5and classd6, there are also big percentage of optimal MVs appears, which implies the motion of this frame is directional irregular. While the distance correlation suggests that 90% of optimal MVs locate within the radius of 8 pels, which is quite stable when considering distance correlation. Considering both directional and distance correlation, there are only 3 partition regions, i.e.(d2,g1), (d2,g2)and(d6,g1) with more than 10% probabilities to contain the optimal position. In the meanwhile, 21 of 64 regions’ MV correlation probabilities are more than 0.1%. This suggests that intensive search is only needed to be performed in these regions. While the rest of regions, it is sufficient to be coarsely searched or even be totally skipped.

Figure 5.

The 10th frame of CIF video sequence “coastguard”.

Further illustration is demonstrated in Figure 6. The directional division describe the direction difference between the predicted motion vector and the optimal motion vector. The Octagon grid partitions represent the distance difference between two vectors. Such category division is not in image pels domain. It represents the unique motion character of each frame. Direction class d1 covers ±22.5o of direction different between predicted motion vector and optimal motion vector. For each MB, the predicted motion vector determines the initial direction d1 and then the division pattern is rotated accordingly. As the motion correlation for each frames are different, the division pattern is different among frames. For each macroblock, the predicted MVs are different, so that the search pattern is also adaptively changed.

Based on above satiric analysis, SAAS algorithm is proposed, which provides a more accurate approach to obtain optimal motion vector. Similar to PIDS algorithm, the number of search points in each division is adaptively adjusted. But more computational complexity can be saved as the intensive searching areas are more precisely divided with help of different Octagon grids.

Figure 6.

3.1. Simulated annealing algorithm

Simulated annealing (SA) [13] is a probabilistic method for finding the global minimum of an optimization problem. It works by emulating the physical process where liquids are slowly cooled so that the atoms are often able to line themselves up and form a pure crystal. The crystal can be seemed as the minimum energy state for this system. SA is especially suitable for the large scale problems with the global minimum hidden among several local minimum. The motion estimation is such kind of optimization problem that search for the optimal motion vector with minimum RD cost. However, most fast motion estimation search algorithms look for steepest descent for minimization and go downhill as far as they can go, as shown in Figure 7. Hence, these algorithms are easily trapped into a local minimum.

Figure 7.

Uphill and downhill searching on rate-distortion surface

Avoiding the disadvantage stated above, SA algorithm can be viewed as a good solution to motion estimation search algorithm, in which occasional uphill moves will help the process escape from local minima. The so-called Boltzmann probability distribution as defined in equation (5),

expresses that a system at temperature T has its energy probabilistically distributed among all different energy states. Even at low temperature, there is a chance for the system to get out of a local energy minimum. Therefore, the system sometimes goes uphill as well as downhill. But lower the temperature, less chances for any significant uphill to take place. The basic elements of simulated annealing are as follows:

A finite solution space S (set of states).

An objective function E(s) (analogy of energy) at states, whose minimization is the goal of the procedure.

A Neighbourhood structureN(s).

A nonincreasing function T called cooling schedule, which controls the annealing procedure, and T(t) is called the temperature at time t.

Given the above elements, the process of SA searches for the minimum energy state s0 is described as follows:

3.2. Simulated Quenching algorithm

SA solution usually requires a large number of function evaluations to find the global minimum, which cause the speed of process is quite slow. That is the main disadvantage when using in fast motion estimation algorithm. To speed up the algorithm, a Simulated Quenching (SQ) methodology was proposed. Like SA, SQ algorithm also resembles the cooling process of molten metals through annealing. The analogy of the technique remains the same as that of SA except for quick temperature reduction annealing schedule. Thus the cooling rate becomes one of important parameters, which governs the successful working of SQ.

As in fast motion estimation algorithm, video contents and motion character are changing all the time, it’s quite difficult to find a unique cooling scheme for such complicated application. In our proposed SAAS algorithm, we adaptive choose annealing schedule according to MV correlation probabilities information. For the frame with steady motion and high MV correlation, larger values of MV correlation probabilities are more easily to distribute in fewer divided regions. In this case, the faster anneal schedule will safely lead to global optimum. While a slower annealing schedule will be choosing when the frame with more irregular motion and MV correlation distribution is flat. The proposed SAAS algorithm with adaptive cooling scheme is specified in next section.

4. Proposed SAAS algorithm

The PIDS algorithm adaptive selects the intensive and coarse search regions in directional partition. However, with fixed number of search points in each direction area, it cannot adjust the search range for different motion scenes. To tackle this drawback, search pattern in SAAS algorithm is no longer restricted to certain directional regions, but is adaptively selected from more specific divided regions based on the MV correlation statistics. Flow chart of the SAAS algorithm is depicted in Figure 8 for better illustration. For each frame, ME search pattern is determined by MV correlation statistics. For each block, 24 directional candidates are employed to determine initial class d1 as shown in Figure 9. Then, the search window division is carried out based on ME search pattern of the frame. One example of d1=c4 in the 10th frame of coastguards is demonstrated in Figure 10. In order to avoid trapping into a local minimum, Simulated Annealing based solution methodology is adopted to process the uphill and downhill searches, where MV correlation probabilities are set as the temperature parameter to control the annealing process adaptively.

Figure 8.

Flow chart of the proposed SAAS algorithm

4.1. Dynamic update of MV correlation probability

In the SAAS algorithm, it is very important to keep MV correlation probabilities accurate. Not only because the MV correlation probabilities is the crucial element for search region partition and annealing schedule, but also motion characteristic of each video sequence is unique and the MV correlation probabilities are changing all the time. A pre-processing step is conducted to reveal the motion correlation characteristic for the each frame.

The MV correlation probabilities p(MVCdi,gj)are calculated with equation (2) by MV directional and distance correlation probabilities. In order to get more accurate MV correlation characteristic, the first octagon grid is started from 2 pixels to get rid of the points near centre. This is mainly because of the MV directional difference is meaningless when optimal points are close to the centre point. After the calculation, the MV correlation probabilities are sorted by descending order with corresponding region(di,gj), which represents the region in direction di and in the gj grid. A parameter temp(di,gj) that affects the annealing schedule as well as acceptance condition is also assigned by MV correlation probabilitiesp(MVCdi,gj).

4.2. Step 1: Initial search point decision

The initial search point is selected from the four prediction models defined in the UMHexagonS. Based on the analysis in the last section, vectors around initial search point have a high probability to be the optimal MV. Therefore, we define large diamond search with 8 searching points around the start search point, which is similar shown in Figure 2 as step1. In contrast to the 25 point rectangular full search in UMHexagonS, this large diamond search reduces the computational requirement without sacrificing its accuracy. The point with the minimum rate-distortion cost is determined as the initial search point.

Figure 9.

candidate directions for d1 determination

4.3. Adaptive partition of search area

After obtaining the initial search point, search area need to be divided based on MV correlation probabilities and predicted MV. 24 candidate directions(c1,c2…,c24), 3 times more than PIDS’s, are employed which is indicated in Figure 9. The candidate direction ci with minimum degree difference to dpredis determined as initial search direction d1. The directional regions with boundary of ±22.5°,±67.5°,±112.5°and ±157.5°are spread according to the initial search direction d1. Then the octagon grids are utilized to divide the search window into regions, where is the search range. Based on that, the search window is adaptively partitioned. The coordinate of each region is represented by(di,gj).Parametertemp(di,gj)are assigned to each region (di,gj) as indexing. One more example is shown in Figure 10, which shows the search window divisions are adjusted when initial direction d1=c4 in coastguards’ 10th frame. Compared to the search window partition in figure 6, when d1=c1, the whole search pattern is changed as the difference of predicted MV. In next step, the simulated annealing search process will be conducted on different search region r(di,gj)with parametertemp(di,gj)as cooling scheme.

Figure 10.

Search area division by directions and grids in SAAS, example of d1=c4, the 10th frame of coastguard

4.4. Step 2: Simulate annealing search

4.4.1. Objective function and solution space

In order to employ simulated annealing search in BMME algorithm, the SA elements are defined combing the concept of motion estimation in this section. The procedure for optimal MV searching is performed using predicted MV as centre of the search window. To optimally select the least rate-distortion cost, Lagrangian multiplier tool [14] are defined as follow:

where mv is the candidate motion vector,mvpred is the predicted the motion vector from neighbour blocks. s and c are the source video and the reconstructed video, respectively. SAD represents sum of absolute difference between the block in current frame and the block in the reference frame. R()represents the bits used to encode the motion information computed by a table-lookup and λMis the Lagrangian multiplier set according to the quantization parameter (QP), which is given by

The divided regions r(di,gj) in search window are denoted as solution space, which is indexed by MV correlation probabilities rather than spatial neighbour region. The order of regions with decreasing MV correlation probabilities for the 10th frame of sequence coastguard is shown in Table 2. This mechanism can be seemed as a randomly selection from solution space. Compared to the simple downhill search in continuous space, this scheme intensively searches the regions with higher MV correlations probabilities first. For the regions with lower probabilities, coarse search or early terminal will be applied.

Table 2.

Mv search region order in the 10th frame of sequence coastguard

4.4.2. Annealing schedule

The annealing schedule is one of crucial parameter for the SA process. If the temperature in the system dropping too fast, the advantage of SA, which converge to the global optimum, is defeated. However the too slow cooling process might affect the efficiency of our fast searching algorithm. Moreover, it quite difficult to set a fixed annealing schedule for the changeable video contents. In SAAS algorithm, the sorted MV correlation probabilities p(MVCdi,gj)are assigned to corresponding annealing parameters temp(di,gj) for regionr(di,gj)to control the annealing schedule.

temp(di,gj)is a set of parameter in pixel domain for particular block, while p(MVCdi,gj)is a relative parameter in frame level. By using this adaptive annealing schedule, the cooling speed is changing with video content and motion correlation, while governs the successful working of the SA procedure.

To improve the searching efficiency, SAAS performs different number of iterations at different temperature status. Inside each regionr(di,gj),mv search is randomly performed along the directiondiin the range of[gj−1,gj]. The number of search points (NumS(di,gj)) in division r(di,gj)is determined by a pair of thresholds, temp_high and temp_low.

After several experiments with more than 50 different sequences, we empirically determined temp_high = 0.3 and temp_low = 0.15. These thresholds provide satisfying performance on different motion senrou. By utilizing this mechanism, SAAS exploits intensive search in the regions with high MV correlation, and selects fewer search points in less correlation region automatically.

4.4.3. Minimum accepted condition

The minimum accepted condition in SA is based on Boltzmann probability distribution. Referring to equation (3), there is a high probability to perform uphill search when difference of cost function E is smaller and the temperature T is higher. By using Boltzmann concept, SAAS utilizes the following SA Condition.

Withρ,the SA condition is directly proportionate to temp(di,gj)and inversely proportional to difference of cost function E. IfE(best)>E(r(di,gj)), regionr(di,gj)is directly identified as the current global optimal. Otherwise, the SA condition still provides occasional upward moves. As ρ is controlled bytemp(di,gj), for the division with lowertemp(di,gj), the chance to conduct upward moves is smaller.

4.4.4. Termination condition

It is impossible to conduct SA search on all search partition, as there are regions partitioned in search window. Moreover, the majority of regions contain low MV correlation probabilities, as shown in Table 1. For these reasons, it is appropriated to limit the total number of search regions (NumSR) and have an early termination condition. Two termination conditions are given, one is the temperature status and the other is the number of searched regions.

If one of these termination conditions is satisfied, the SA search will stop and go to the Extended Hexagon Search step which is introduced in UMHexagonS, to refine the local optimum. Otherwise, SA search will proceed to next region by the indexed of the decreasing parameter temp.

4.5. Step 3: Extended hexagon-based search

A large hexagon search pattern and a small diamond search pattern are employed in this step, which is modified from UMHexagonS. The large diamond pattern has six search locations, while the small diamond search pattern has four points. The large hexagon pattern in the step 3-1 is recursively used and its centre recursively moved until the location with the minimum rate-distortion cost lies in the centre of the hexagon. After this, a small diamond pattern in the step 3-2 is recursively utilized until the location with the minimum rate-distortion cost is at the centre of this pattern. Finally, this point is determined as the point of motion vector for the current block. But our Extended Hexagon-based search process is only limited within the one search regionr(di,gj), which contains the optimal MV. Compared to UMHexagonS, this centre basis optimal MV refinement approach can obtain optimal MV with fewer search points.

Table 3.

Results of proposed SAAS comparing to that of UMHexagonS and PIDS in terms of average search points reduction (%) and motion estimation time reduction (sec) (QP=28)

5. Experimental results

In this section numerous experiments with H.264/AVC reference Joint Model (JM) software version 16.1 were conducted. We compared the proposed SAAS algorithm against the FS, PIDS and UMHexagonS algorithms, in terms of computational complexity (speed measured by ME time and average search points (ASP)) and Rate-distortion performance (PSNR and bit rate). Several commonly used sequences, covering a wide range of motion characteristics, are taken into consideration.

Figure 11.

Rate-Distortion performance comparison of FS, UMHexagonS, PIDS and SAAS at various QPs

The group of picture (GOP) structure was IPPP, in which only first frame has been coded as I frame and first P frame has been coded by UMHexagonS. The sequences are tested at 30fps (frames per second). The Content Adaptive Variable Length Coding (CAVLC) entropy coder is used for all the simulations, with 5 reference frames. A search range of 32 and the quantization parameter of 28 are used. The simulation platform in our experiments is done with a PC of 2.44 GHz CPU and 8G RAM.

For complexity comparisons, the proposed algorithm is compared to the hybrid UMHexagonS adopted by the H.264/AVC reference software. Two different measurements are used to calculate the computational efficiency, average search points requirement and encoding time. Results are presented in Table 3. As shown in the Table 3, SAAS needs 48-63% less search points than UMHexagonS and saves average of 21% encoding time. Since it performs more precise search pattern adjustment, SAAS requires average 45% less search points than PIDS.

Table 4.

Results of proposed SAAS comparing to that of FS, UMHexagonS and PIDS in terms of PSNR gain (dB) and bit-rate degradation (%) (QP=28)

Considering the large reduction in the computational complexity, the quality degradation is very small. Rate-Distortion performances are presented in Table 4. In all cases, the FS outperforms the others in image quality and bit-rate. Compared to FS, average PSNR degradation of the proposed algorithm is only 0.010. In term of bit-rate, SAAS has a slightly higher degradation than PIDS and UMHexagonS. SAAS has bit-rate decreasing of 0.64% in average. Further information can be obtained in Figure 9, which compares the rate-distortion performance among FS, UMHexagonS, PIDS and SAAS against different QPs (16, 20, 24, 28, 32, 36 and 40). Figure 10 compares the simulation results versus frame number of video sequences Coastguard. It is clearly reveal the superiority of SAAS to UMHexagonS in computational reduction, which more than 50% of search points are saved while the PSNR and bit-rate performance are very similar. From the results above, it can be confirmed that the SAAS algorithm has the capability to dramatically reduce the computational burden with negligible degradation in the RD performance.

Figure 12.

6. Conclusion

This chapter presents a novel fast motion estimation algorithm, Simulated Annealing Adaptive Search algorithm. As mv field has heavy correlation, the proposed algorithm takes the advantage of MV correlation information, which is statistically calculated and plays a significant role in SAAS process. In the SA search step, the search region is adaptively divided and the divisions are searched indexed by MV correlation probabilities in descending order. Furthermore, by utilizing Boltzmann probability concept, the minima acceptation or rejection condition of each SA search is controlled by this correlation information. The experimental results demonstrate that more than 48% of ASP and 21% of ME time can be saved, while maintaining a similar bit-rate without losing the picture quality.