Abstract

H.264 Advanced Video Coding (AVC) was prolonged to Scalable Video Coding (SVC). SVC executes in different electronics gadgets such as personal computer, HDTV, SDTV, IPTV, and full-HDTV in which user demands various scaling of the same content. The various scaling is resolution, frame rate, quality, heterogeneous networks, bandwidth, and so forth. Scaling consumes more encoding time and computational complexity during mode selection. In this paper, to reduce encoding time and computational complexity, a fast mode decision algorithm based on likelihood mode decision (LMD) is proposed. LMD is evaluated in both temporal and spatial scaling. From the results, we conclude that LMD performs well, when compared to the previous fast mode decision algorithms. The comparison parameters are time, PSNR, and bit rate. LMD achieve time saving of 66.65% with 0.05% detriment in PSNR and 0.17% increment in bit rate compared with the full search method.

1. Introduction

H.264 Scalable Video Coding (SVC) as an elongation of H.264 Advanced Video Coding (AVC) permits a single encoding but multiple decoding capabilities [1] of various gadget requirements. SVC prolongs all the characteristics of AVC; in addition to that, it provides a multiple layered approach, efficiency in coding, and so forth. The multiple layered approach constitutes base layer and one or more enhancement layers. The base layer consists of more essential information in the form of bit stream. The bit stream is partitioned off into more amounts of subset bit streams [1] known as enhancement layer. The subset bit stream comprises only essential message of the video while removing all redundant and less essential messages. The less essential message is deduced from base layer and already coded enhancement layers [1]. The base layer contains a bit stream of low resolution or low frame rate or low quality. The enhanced resolution or frame rate or quality will be obtained by adding enhancement layer bit streams.

SVC can be able to decode the video content, even with a limited bit stream of its exclusive feature, referred to as scalability. Scalability in SVC undergoes three levels: spatial, temporal, and quality. Spatial scalability refers to the resolution or the dimension of the video. Temporal scalability refers to the number of frames per second in the video. Quality or SNR scalability refers to PSNR (peak signal to noise ratio) gain in the video. SVC executes temporally with hierarchical B picture prediction of frames. A frame in a video is categorized into macroblocks (MBs). A macroblock contains many blocks of modes in which each has its own identity. The temporal scalability performs mode search for a prime mode in a macroblock (MB). SVC constitutes three types of frames, such as I or intraframe, P or prediction frame, and B or bidirectional prediction frames. I frame constitutes more essential information which requires all the modes in a macroblock to be coded. P frames contain essential information, but less compared to I frame, which requires few modes to be coded. B frames contain less essential information which requires very few modes to be coded.

The frames are divided in terms of fixed size macroblocks of in the former standard. But, in H.264/AVC, variable block sizes of , , , , , , and are available. It also offers to have its own way of estimating the modes on how the macroblock is divided. The prime mode for a MB or block will be decided based on rate distortion cost (RDC) function using Lagrangian parameter. RDC computation includes integer transform, quantization, and entropy coding in both forward and backward process. RDC computed for all the modes in a macroblock and the mode with minimum value is decided as the prime mode in a MB or block. SVC defines nine intramodes for prediction in a block of INTRA , four intramodes for prediction in a macroblock of INTRA , seven intermodes for prediction in a macroblock of INTER , INTER , INTER , INTER , INTER , INTER , and INTER , and one SKIP mode [2]. For motion estimation, BL_PRED mode and QPEL_REF mode for enhancement layer are added to the modes of the base layer. A full search method decides upon the ideal motion vector difference (MVD) using RDC between the current frame and the previous frame. The MVD is the difference between a predicted motion vector and actual motion vector between current and previous frame. The computationally in-depth rate distortion optimization method increases encoding time and complexity and results in many fast motion decision algorithms to develop. More algorithms are evaluated for AVC which is less difficult compared to SVC. But few algorithms are evaluated for SVC which saves time while selecting a prime mode. These algorithms are discussed in the next section.

2. Related Work

The complexity in determining the prime mode in H.264/AVC is proposed [3], which saves encoding time. The proposed method involves Lagrangian optimization with rate and distortion cost to decide the prime mode while achieving less encoding time and coding efficiency. A motion activity-based mode decision (MAMD) algorithm is proposed in [4] to speed up the coding time by minimizing the number of candidate modes. The candidate modes are skipped based on motion vectors, avoiding them to be coded, thus reducing time. The candidate modes of the enhancement layer are significantly lessened on the relation between the base layer and the enhancement layer in [5]. But, base layer modes are chosen based on full search process. A probability based coding mode decision algorithm [6] is accomplished for H.264/AVC. The mode is resolved with the maximum probability of correlation between the adjacent block and present block. The probability model saves more encoding time. A timely outcome of mode decision is proposed for the enhancement layer MB in [7]. If the MB is found to be all zeros, then the previous MB can be chosen and the mode decision method can be earlier terminated.

The enhancement layer MB mode is determined using the Bayesian theorem, proposed in [8]. The proposed algorithm discusses the Markov procedure. The Markov procedure based likelihood analysis finds the mode for a macroblock earlier and saves time. The correlation among adjacent MBs of the base layer and the colocated MBs of enhancement layer is utilized to forecast the mode in the enhancement layer in [9]. A selective interlayer residual prediction using Lagrangian RDC based fast mode decision algorithm is proposed in [10]. The Lagrangian parameter involved in this prediction reduces coding time while deciding appropriate mode. Classification based intra-inter mode decision is accomplished in [11]. The frame will be coded or skipped based on the determination of intra-inter coding for rate control. This approach is devoted to video over networks and then bestowed to scalable video coding. In [12], the relation between MB of enhancement layer and its colocated base layer MB is used for mode decision. Intermode prediction for temporal scalability is proposed in [13]. The proposed method compares the pixel values of the current MB with reference block using statistical analysis. In our previous works [14], a desired mode list is constructed for predicting the mode in the base layer. The mode for enhancement layer is predicted based on correlation between current frame and reference frame. A quick video streaming through the Internet is accomplished using a mathematical model in [15]. The mathematical model maximizes the information rate to the client from the streaming server, which plays a delay-free video.

Although each proposed algorithm evaluates faster encoding time in deciding the prime mode, it fails to fulfill in terms of PSNR and bit rate with the full search method. Only a comparative measure was obtained among different algorithms in terms of encoding time, irrespective of the computational complexity involved. As a result, a fast mode decision algorithm with low computation complexity which attains less encoding time is proposed. The proposed algorithm uses likelihood mode decision method, discussed in the next section.

3. Likelihood Mode Decision

In SVC, the rate distortion cost (RDC) based mode decision is performed. The mode with minimum RDC will be decided as prime mode for each MB in full search method. But the complexity in estimating RDC for each mode in a MB is tedious, in turn consuming more encoding time. To decide a prime mode earlier and escape from encoding unwanted modes are the question to be discussed. In this section, a likelihood mode decision (LMD) algorithm is proposed which decides the prime mode for I frame of the enhancement layer. P/B frames which are derived from I frames need less attention. Also, these frames hold less essential information; a selective prediction of modes can be implemented for obtaining the prime mode. The I frame of base layer is of more importance which follows a standard full search algorithm, while P/B frames involve an enhanced selective prediction of certain modes.

The likelihood model is evaluated below to show the importance of likeliness in terms of intermode prediction. The likeliness of modes between adjacent and current MB resembles high degree of likeliness to be same mode. The video sequence tested with various quantization parameters for different MB is disclosed in Table 1.

Table 1: Macroblock average likeliness under various QP.

Let represent a MB available at the top left pixel of th frame. MBs will be available from to in around a current block as shown in Table 1. These MBs will have a similar degree of likeliness with current th frame, previous th frame, and previous colocated MB th frame with current coding MB. Each MB around current coding MB will have its average likeliness of the same mode. From the observation, the intermodes tend to have same mode resemblance with minimum of five MBs in order. The chosen five MBs in order will decide the prime mode for a MB. So, we define a MB set with these five MBs:where represents the MB located on the th frame with the upper left pixel at , and represents the former colocated MB with same as current coding MB. We define adjacent mode set bywhere represents the encoding mode of MB. The proximate likelihood of the mode to be a prime mode is assumed by the likelihood model:where is occurrence measure of an event and is constant argument which is same for all modes. Since is not a member of , the state may have less likelihood to be prime mode and is observed to be nothing. With the likelihood standard explained in declaration (3), a few numbers of intermodes present in set have more likelihood to be prime mode.

Table 2 discloses the outcome of the likelihood standard when addressed for the video sequences. The set of intermodes in set has the full likeliness of being prime mode. The intermodes not in set will have less likelihood to be prime mode. A contrastive analysis of standard likelihood with the video sequences of intermode distribution is shown in Figure 1.

Table 2: Intermode distribution based on average likeliness.

Figure 1: Intermode distribution over average likeliness.

Video sequences with both fast and slow motion have a high percent distribution of likeliness to be the prime mode in set . From the experimental observation, we conclude to search in set for prime mode in the beginning. This will significantly reduce the encoding time compared with the full search method. Also the video sequence which has slow motion has a higher likeliness than with fast motion. This indicates that the likelihood model works well with sequences having less motion vector differences.

3.1. Implementation of LMD Algorithm

A desired set of intermodes with maximum likeliness is built using (3) from set . The intermodes in the set are sorted in the order of the highest degree of likeliness. The intermodes not in set are added up in the order of SKIP, Inter , Inter , Inter , and subblocks to the desired set. The mode with maximum likeliness will be a prime mode. The LMD algorithm applied for I or intraframe of enhancement layer which involves more interlayer predictions. This likelihood model will further reduce the encoding time and achieve early termination. Figure 2 shows the flowchart of LMD algorithm. The proposed algorithm initially checks for intermode; having checked, it decides upon the first mode of a macroblock. The first mode of a macroblock is SKIP; then there is no need to encode the mode; if Inter , the prime mode will be maximum likeliness of Direct , Inter , and Inter ; if Inter , the prime mode will be maximum likeliness of Direct , Inter , and Inter ; else check for all subblocks. The prime mode for intramodes will be computed based on minimum RDC.

Figure 2: Flowchart of likelihood mode decision algorithm.

A selective prediction is accomplished for P/B frames of the enhancement layer. Since P/B frames have less information than I frame, these frames can be coded in a less efficient manner. Figure 3 shows the proposed flowchart of the fast mode decision algorithm with LMD. The top, left, and top right modes of a macroblock RDC are computed. The prime mode will be chosen with a minimum RDC among these modes. Meanwhile, a standard search algorithm is implemented for base layer. If the encoding MB was found to be I frame in base layer, mode decision will be minimum RDC of all modes, while P/B frames mode decisions include minimum RDC of BL_PRED, SKIP, , , , and modes.

Figure 3: Flowchart of fast mode decision algorithm for SVC.

The algorithm describes Likelihood Mode Decision for I frames and selective prediction for P/B frames of enhancement layer, whereas a standard full search method is implemented for I frames and an enhanced selective prediction for P/B frames of base layer.

3.2. Pseudocode

4. Experimental Results

The proposed fast mode decision algorithm was evaluated using JSVM reference software 9.19.15 [16] with the simulation parameters as in Table 3. The system configuration uses Intel Core i3 Processor with 2.4 Ghz clock speed and 500 GB hard disk with Windows 7 operating system. We use the following Bus, City, Crew, Football, Foreman, Harbour, Mobile, and Soccer video sequences for the proposed algorithm. All video sequences were implemented both spatially and temporally. QCIF (Quadrature Common Intermediate Format) is for base layer with 15 frames per second and CIF (Common Intermediate Format) is for enhancement layer with 30 frames per second. The combined spatial and temporal implementation of the proposed algorithm has the different sets of quantization parameters. The quantization parameters ranging between 18/22, 28/32, and 38/42 for base and enhancement layer with GOP size set to 16. The search range for a mode in a macroblock is 32.

Table 3: Simulation parameters.

The performance of each video sequence is measured based on the average encoding time in seconds using (4), average luminance peak signal to noise ratio in dB using (5), and average bit rate in kbps using (6) as stated below:

The experimental result for each video sequence under different quantization parameters is shown in Table 4. Each video sequence is evaluated under three measures such as , , and . The measures are obtained for the proposed algorithm and compared with the previous algorithms as shown in Table 4.

Table 4: Simulation results of combined spatial and temporal scalability of various algorithms.

From the observation, the proposed algorithm achieves better encoding time of 66.65% with 0.05% detriment in PSNR and 0.17% increment in bit rate compared with the full search method. This measure is an average value achieved among various quantization parameters chosen in the experiment. It also outperforms all other previous fast mode decision algorithms in terms of encoding time and few algorithms in addition with PSNR and bit rate. Figure 4 shows the percentage time saving relations between video sequences among the proposed and previous algorithms. All algorithms, including the full search method, save encoding time when the value of quantization parameter is high for base and enhancement layer. The full search method encoding time will be more, when the value of quantization parameter is less. In Figure 4, the algorithm shows the amount of time saving obtained even at a low quantization parameter. On an average, the proposed algorithm saves maximum time saving compared to the previous works.

Figure 4: Time saving relationship among different algorithms for various values of QP.

The rate distortion curves are shown in Figures 5 and 6 for Bus and Crew; Harbour and Soccer sequence, respectively. For all sequences, among all FMD algorithms, a small deterioration in PSNR and bit rate is suffered, compared with the full search algorithm. Although, the proposed algorithm deteriorates in PSNR and bit rate, it achieves an acceptable level of quality with other fast mode decision algorithms. Figures 7 and 8 show the time saving curves for Football and Foreman; Mobile and Soccer sequence, respectively. From the observation, under each sequence, the full search method encoding time is more for low QP, with less time for high QP. Even the full search method achieves minimum encoding time under high QP; it is lesser when compared to other FMD algorithms. But, in general, all FMD algorithms outperform full search method at low QP itself. The proposed algorithm achieves maximum time saving for almost all video sequences. Kim’s algorithm achieves faster encoding time for Soccer sequence next to the proposed algorithm.

Figure 5: Rate distortion curves for (a) Bus and (b) Crew.

Figure 6: Rate distortion curves for (a) Harbour and (b) Soccer.

Figure 7: Time saving curves for (a) Football and (b) Foreman.

Figure 8: Time saving curves for (a) Mobile and (b) Soccer.

Here, three cases are to be compared with the values of quantization parameter for the base and enhancement layer: case 1: the sequence with gradual increase in time saving as QP increases; case 2: the sequence with gradual decrease in time saving as QP increases; and case 3: gradual difference in time saving as QP increases. In case 1, the Harbour sequence falls under this category which is a slow motion sequence with less number of MVDs; in case 2, the city sequence is also a slow motion sequence, but with more number of MVDs; and, in case 3, all other sequences such as Bus, Crew, Football, Foreman, Mobile, and Soccer have an average number of MVDs. Even though Foreman, a fast motion sequence, achieves maximum time saving, it is due to the average amount of MVDs; while Mobile, a slow motion sequence, achieves minimum time saving with the same average amount of MVDs and it is due to large MVDs.

Under case 1, it involves less number of I frames and number of P/B frames in the enhancement layer, which involves likelihood model for few I frames, while, under case 2, it involves number of I frames and less number of P/B frames, which involves likelihood model for more I frames. In case 3, irrespective of the sequence, fast or slow, it achieves maximum and minimum time saving based on MVDs globally.

Hence we conclude that the proposed algorithm can achieve maximum time saving for fast motion sequence with lesser amount of MVDs and an average time saving can be realized for slow motion sequence with larger amount of MVDs. Meanwhile, all other sequences realized to produce maximum encoding time. The PSNR deteriorates a bit lower with an acceptable signal level and an increase in bit rate compared to the full search algorithm. It can be compromised with the gain of abundant saving in encoding time.

5. Conclusion

The experimental observation depicts that the proposed algorithm achieves faster encoding time, for both fast and slow motion video sequence. The evaluated measures are compared with the full search method and previous FMD algorithms. The comparison shows that the proposed algorithm outperforms the full search method and other previous algorithms. A desired set of intermodes, built for I frame in the enhancement layer, reduces the computation complexity, thus decreasing encoding time, whereas P/B frames of enhancement layer, with less information, escape from the exhaustive full search method by selective prediction, further reducing encoding time. The future enhancement of the work will enforce achieving same PSNR and bit rate of the full search method with faster encoding time.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.