Abstract: This paper presents a shape-adaptive wavelet coding technique for coding arbitrarily shaped still texture. This technique includes shape-adaptive discrete wavelet transforms (SA-DWT’s) and extentions of zerotree entropy (ZTE) coding and embedded zerotree wavelet (EZW) coding. Shape-adaptive wavelet coding is needed for efficiently coding arbitrarily shaped visual objects, which is essential for object-oriented multimedia applications. The challenge is to achieve high coding efficiency while satisfying the functionality of representing arbitrarily shaped visual texture. One of the features of the SA-DWT’s is that the number of coefficients after SA-DWT’s is identical to the number of pixels in the original arbitrarily shaped visual object. Another feature of the SA-DWT is that the spatial correlation, locality properties of wavelet transforms, and self-similarity across subbands are well preserved in the SA-DWT. Also, for a rectangular region, the SA-DWT becomes identical to the conventional wavelet transforms. For the same reason, the extentions of ZTE and EZW to coding arbitrarily shaped visual objects carefully treat “don’t care” nodes in the wavelet trees. Comparison of shape-adaptive wavelet coding with other coding schemes for arbitrarily shaped visual objects shows that shape-adaptive wavelet coding always achieves better coding efficiency than other schemes. One implementation of the shape-adaptive wavelet coding technique has been included in the new multimedia coding standard MPEG-4 for coding arbitrarily shaped still texture. Software implementation is also available.

Notes:This is the first to propose the landmark work of SA-DWT for coding arbitrarily-shaped visual object. It enables both scalabilities and object-based interactivity. This algorithm has been adopted as a core component in MPEG-4 Still Texture Coding standard and has led to two US patents. Dr. Li is both the principal author and principal investigator of this seminal contribution.

Abstract:A basic framework for efficient scalable video coding, namely progressive fine granularity scalable (PFGS) video coding is proposed. Similar to the fine granularity scalable (PGS) video coding in MPEG-4, the PFGS framework has all the features of FGS, such as fine granularity bit-rate scalability, channel adaptation, and error recovery. On the other hand, different from the PGS coding, the PFGS framework uses multiple layers of references with increasing quality to make motion prediction more accurate for improved video-coding efficiency. However, using multiple layers of references with different quality also introduces several issues. First, extra frame buffers are needed for storing the multiple reconstructed reference layers. This would increase the memory cost and computational complexity of the PFGS scheme. Based on the basic framework, a simplified and efficient PFGS framework is further proposed. The simplified PPGS framework needs only one extra frame buffer with almost the same coding efficiency as in the original framework. Second, there might be undesirable increase and fluctuation of the coefficients to be coded when switching from a low-quality reference to a high-quality one, which could partially offset the advantage of using a high-quality reference. A further improved PFGS scheme can eliminate the fluctuation of enhancement-layer coefficients when switching references by always using only one high-quality prediction reference for all enhancement layers. Experimental results show that the PFGS framework can improve the coding efficiency up to more than 1 dB over the FGS scheme in terms of average PSNR, yet still keeps all the original properties, such as fine granularity, bandwidth adaptation, and error recovery. A simple simulation of transmitting the PFGS video over a wireless channel further confirms the error robustness of the PFGS scheme, although the advantages of PFGS have not been fully exploited.

Notes:This paper contributes a groundbreaking framework for DCT-based predictive SVC. The novel framework brings a quantum leap over MPEG-4 SVC in terms of coding efficiency. It is the first to simultaneously provide fine granularity scalability, high coding efficiency, and strong error robustness. Seven US patents have been awarded based on this work. It is instrumental to the development of MPEG-4 SVC standard. This contribution has pointed out a new research direction on SVC. Dr. Li is the principal investigator of this milestone work.

Abstract:In this paper, image compression utilizing visual redundancy is investigated. Inspired by recent advancements in image inpainting techniques, we propose an image compression framework towards visual quality rather than pixel-wise fidelity. In this framework, an original image is analyzed at the encoder side so that portions of the image are intentionally and automatically skipped. Instead, some information is extracted from these skipped regions and delivered to the decoder as assistant information in the compressed fashion. The delivered assistant information plays a key role in the proposed framework because it guides image inpainting to accurately restore these regions at the decoder side. Moreover, to fully take advantage of the assistant information, a compression-oriented edge-based inpainting algorithm is proposed for image restoration, integrating pixel-wise structure propagation and patch-wise texture synthesis. We also construct a practical system to verify the effectiveness of the compression approach in which edge map serves as assistant information and the edge extraction and region removal approaches are developed accordingly. Evaluations have been made in comparison with baseline JPEG and standard MPEG-4 AVC/H.264 intra-picture coding. Experimental results show that our system achieves up to 44% and 33% bits-savings, respectively, at similar visual quality levels. Our proposed framework is a promising exploration towards future image and video compression.

Notes:This paper represents a paradigm shift in image coding by leveraging computer vision technology. Instead of coding an image pixel by pixel, the breakthrough contribution is that it only encodes inpainting parameters for regions that can be restored by inpainting. It suggests a completely new framework to lift the coding efficiency out of the current plateau. Dr. Li is the principal investigator of the project.

Abstract:This paper provides an overview of the Barbell lifting coding scheme that has been adopted as common software by the MPEG ad hoc group on further exploration of wavelet video coding. The core techniques used in this scheme, such as Barbell lifting, layered motion coding, 3D entropy coding and base layer embedding, are discussed. The paper also analyzes and compares the proposed scheme with the oncoming scalable video coding (SVC) standard because the hierarchical temporal prediction technique used in SVC has a close relationship with motion compensated temporal lifting (MCTF) in wavelet coding. The commonalities and differences between these two schemes are exhibited for readers to better understand modern scalable video coding technologies. Several challenges that still exist in scalable video coding, e.g., performance of spatial scalable coding and accurate MC lifting, are also discussed. Two new techniques are presented in this paper although they are not yet integrated into the common software. Finally, experimental results demonstrate the performance of the Barbell-lifting coding scheme and compare it with SVC and another well-known 3D wavelet coding scheme, MC embedded zero block coding (MC-EZBC).

Notes:This journal paper expands Dr. Li’s previous contributions and derives a unified Barbell lifting scheme to enable efficient temporal decomposition. It brings all traditional motion compensation techniques to the new MCTF SVC paradigm. It marks a new milestone in SVC and is instrumental to the SVC extension in H.264 standard.

Abstract:We present a novel 2-D wavelet transform scheme of adaptive directional lifting (ADL) in image coding. Instead of alternately applying horizontal and vertical lifting, as in present practice, ADL performs lifting-based prediction in local windows in the direction of high pixel correlation. Hence, it adapts far better to the image orientation features in local windows. The ADL transform is achieved by existing 1-D wavelets and is seamlessly integrated into the global wavelet transform. The predicting and updating signals of ADL can be derived even at the fractional pixel precision level to achieve high directional resolution, while still maintaining perfect reconstruction. To enhance the ADL performance, a rate-distortion optimized directional segmentation scheme is also proposed to form and code a hierarchical image partition adapting to local features. Experimental results show that the proposed ADL-based image coding technique outperforms JPEG 2000 in both PSNR and visual quality, with the improvement up to 2.0 dB on images with rich orientation features.

Notes:This paper is the first to extend the concept of Barbell lifting to spatial prediction for a novel yet significantly more efficient image coding scheme.

Abstract:This paper proposes an efficient scalable coding scheme with fine-grain scalability, where the base layer is encoded with H.26L, and the enhancement layer is encoded with PFGS coding. Motion compensation with adaptive block size has greatly contributed to the coding efficiency improvement in H.26L. In order to improve efficiency of the enhancement layer coding, an improved motion estimation scheme that uses both information from the base layer and the enhancement layer is also proposed in this paper. As H.26L significantly improves the coding efficiency of the base layer compared with MPEG-4 ASP (Advanced Simple Profile) and PFGS coding is a significant improvement over MPEG-4 FGS at the enhancement layer, the proposed scheme, which is a nontrivial extension of PFGS to H.26L, can benefit from both H.26L and PFGS coding. Experiments show that the overall coding efficiency gain of the proposed scheme is about 4.0 dB compared with MPEG-4 FGS.

Notes:This is the first to extend PFGS SVC scheme to the emerging video coding standard H.264. This work directly impacted the final SVC extension in H.264 standard.

Abstract:This paper presents an efficient video coding algorithm: Three-dimensional embedded subband coding with optimized truncation (3-D ESCOT), in which coefficients in different subbands are independently coded using fractional bit-plane coding and candidate truncation points are formed at the end of each fractional bit-plane. A rate-distortion optimized truncation scheme is used to multiplex all subband bitstreams together into a layered one. A novel motion threading technique is proposed to form threads along the motion trajectories in a scene. For efficient coding of motion threads, memory-constrained temporal wavelet transforms are applied along entire motion threads. Block-based motion threading is implemented in conjunction with 3-D ESCOT in a real video coder. Extension of 3-D ESCOT to object-based coding is also addressed. Experiments demonstrate that 3-D ESCOT outperforms MPEG-4 for most test sequences at the same bit rate.

Notes:This paper marks a new generation of 3D wavelet video coding with lifting based techniques for efficient temporal decomposition, and 3D ESCOT coding for rate-distortion optimization. This work has been frequently cited (>100).

Notes:This work is the first to leverage the lifting algorithm to solve the long-standing boundary effects problem in 3D wavelet video coding. It also shed light on the milestone work on Barbell lifting algorithm.

Abstract:SMART, the acronym of scalable media adaptation and robust transport, is a suite of compression and transmission technologies for efficient, scalable, adaptive, and robust video streaming over the best-effort Internet. It consists of two indispensable parts: SMART video coding and SMART video streaming. The SMART video coding part is an efficient DCT-based universal fine granularity scalable coding scheme. Since the SMART video coding scheme adopts multiple-loop prediction and drifting reduction techniques at the macroblock level, it can achieve high coding efficiency at a wide range of bit rates. More importantly, it provides all sorts of scalabilities, that is, quality, temporal, spatial, and complexity scalabilities, in order to accommodate heterogeneous time-variant networks and different devices. The SMART video streaming part is a transport scheme that fully takes advantages of the special features of the scalable bitstreams. An accurate bandwidth estimation method is first discussed as the prerequisite of network adaptation. Then, flexible error resilience technique and unequal error protection strategy are investigated to enhance the robustness of streaming SMART bitstream. The SMART system shows excellent performances with regard to high coding efficiency, flexible channel bandwidth adaptation, smooth playback, and superior error robustness in static and dynamic experiments.

Notes:This paper documents the first scalable video streaming system over the Internet. It provides solid evidence for the standardization of SVC in H.264.

Abstract: The newly adopted MPEG-4 fine granularity scalability (FGS) video coding standard offers easy and flexible adaptation to varying network bandwidths and different application needs. Encryption for FGS should preserve such adaptation capabilities and enable intermediate stages to process encrypted data directly without decryption. In this paper, we propose two novel encryption algorithms for MPEG-4 FGS that meet these requirements. The first algorithm encrypts an FGS stream (containing both the base and the enhancement layers) into a single access layer and preserves the original fine granularity scalability and error resilience performance in an encrypted stream. The second algorithm encrypts an FGS stream into multiple quality layers divided according to either peak signal-to-noise ratio (PSNR) or bit rates, with lower quality layers being accessible and reusable by a higher quality layer of the same type, but not vice versa. Both PSNR and bit-rate layers are supported simultaneously so a layer of either type can be selected on the fly without decryption. The base layer for the second algorithm may be unencrypted to allow free view of the content at low-quality or content-based search of a video database without decryption. Both algorithms are fast, error-resilient, and have negligible compression overhead. The same approach can be applied to other scalable multimedia formats.

Notes:This paper is the first to solve two key problems in scalable DRM: adaptive streaming in cipher text, and layered protection. This work has filled in the blank of a complete scalable video ecosystem.

Abstract:Most modern computers or game consoles are equipped with powerful yet cost-effective graphics processing units (GPUs) to accelerate graphics operations. Though the graphics engines in these GPUs are specially designed for graphics operations, can we harness their computing power for more general nongraphics operations? The answer is positive. In this paper, we present our study on leveraging the GPUs graphics engine to accelerate the video decoding. Specifically, a video decoding framework that involves both the central processing unit (CPU) and the GPU is proposed. By moving the whole motion compensation feedback loop of the decoder to the GPU, the CPU and GPU have been made to work in parallel in a pipelining fashion. Several techniques are also proposed to overcome the GPUs constraints or to optimize the GPU computation. Initial experimental results show that significant speed-up can be achieved by utilizing the GPU power. We have achieved real-time playback of high definition video on a PC with an Intel Pentium III 667-MHz CPU and an nVidia GeForce3 GPU.

Notes:This is the pioneering work to leverage the generic GPU power to accelerate video decoding. It opened doors to the now widely adopted GPU-assisted multimedia processing.

Abstract:The H.264 video coding standard provides considerably higher coding efficiency than previous standards do, whereas its complexity is significantly increased at the same time. In an H.264 encoder, the most time-consuming component is variable block-size motion estimation. To reduce the complexity of motion estimation, an early termination algorithm is proposed in this paper. It predicts the best motion vector by examining only one search point. With the proposed method, some of the motion searches can be stopped early, and then a large number of search points can be skipped. The proposed method can work with any fast motion estimation algorithm. Experiments are carried out with a fast motion estimation algorithm that has been adopted by H.264. Results show that significant complexity reduction is achieved while the degradation in video quality is negligible.

Notes:This paper is the first to contribute a novel low complexity motion estimation algorithm for H.264. It has led to the implementation of real-time H.264 encoders. It is the 12th most downloaded T-CSVT paper in 2005 (1,454 downloads).

Abstract:The new H.264 (MPEG-4 AVC) video coding standard can achieve considerably higher coding efficiency compared to previous standards. This is accomplished mainly due to the consideration of variable block sizes for motion compensation, multiple reference frames, intra prediction, but also due to better exploitation of the spatiotemporal correlation that may exist between adjacent Macroblocks, with the SKIP mode in predictive (P) slices and the two DIRECT modes in bipredictive (B) slices. These modes, when signaled, could in effect represent the motion of a macroblock (MB) or block without having to transmit any additional motion information required by other inter-MB types. This property also allows these modes to be highly compressible especially due to the consideration of run length coding strategies. Although spatial correlation of motion vectors from adjacent MBs is used for SKIP mode to predict its motion parameters, until recently, DIRECT mode considered only temporal correlation of adjacent pictures. In this letter, we introduce alternative methods for the generation of the motion information for the DIRECT mode using spatial or combined spatiotemporal correlation. Considering that temporal correlation requires that the motion and timestamp information from previous pictures are available in both the encoder and decoder, it is shown that our spatial-only method can reduce or eliminate such requirements while, at the same time, achieving similar performance. The combined methods, on the other hand, by jointly exploiting spatial and temporal correlation either at the MB or slice/picture level, can achieve even higher coding efficiency. Finally, improvements on the existing Rate Distortion Optimization related to B slices within the H.264 codec are also presented, which can lead to improvements of up to 16% in bit rate reduction or, equivalently, more than 0.7 dB in PSNR.

Notes:This work is the first to propose an efficient and low complexity spatial DIRECT mode for coding motion information. It has been adopted in H.264 standard. It is the 6th most downloaded T-CSVT paper in 2005 (1,903 downloads).