Depth maps provide a totally new dimension on how machines sense the world besides texture, and make computers one step closer to human beings. Thus, integrating depth with other information is now an active area of research in computer vision and image processing. It has also become an important ingredient in various real world applications studied recently. This thesis focuses on two important topics in such a direction, namely 1) restoration of depth maps with improved quality, 2) incorporation of depth information into emerging applications for better performance.
Though there are various devices for depth map acquisition, Microsoft Kinect is widely used in many applications because of its lower cost and higher spatial resolution. However, it still suffers from problems such as missing data and noisy measurement. To tackle these problems, new registration and restoration algorithms are proposed based on the concept of joint color-depth consistency. Specifically, a new mutual information (MI) based matching method is first presented to reduce the registration errors between the color and depth cameras of Kinect. Then, two joint color-depth restoration algorithms, namely 1) surface normal based joint bilateral filter (JBF) and 2) superpixel-based restoration method, are proposed to inpaint the missing data in depth maps. Several novel inpainting and segmentation techniques, including joint color-depth probabilistic superpixel, probabilistic local polynomial regression (LPR) and joint color-depth matting, are also developed to improve restoration performance. Experimental results show that the proposed restoration algorithms effectively inpaint and refine the missing data of depth maps so as to achieve better color-depth consistency.
Utilizing the restored depth map, we further develop two real world applications, namely real-time image based rendering (IBR) and hand gesture recognition. Conventional IBR systems are usually based on off-line depth estimation and segmentation. With depth information it is possible to perform real-time IBR. However, depth uncertainty and disocclusion are two key problems to be addressed. Here, we propose a confidence-based rendering algorithm for reducing artifacts and a depth assisted dynamic background modelling scheme for inpainting. The proposed IBR system is implemented on graphic processing units (GPUs) to achieve nearly real-time performance. Experimental results show that our system can provide better visual quality of synthesized views, compared with conventional depth image-based rendering (DIBR) methods.
In our hand gesture recognition application, the depth map is utilized to isolate the user's hand through appropriate thresholding, while the hand location is obtained from the body skeleton estimated by Kinect. The hand shapes in the form of depth and color information are then jointly represented as superpixels, which effectively capture the shapes and color of the gestures to be recognized in a more compact form. Based on this representation, a novel distance metric, Superpixel Earth Mover's Distance (SP-EMD), is proposed to measure the dissimilarity between the hand gestures. Experimental results using both our and two other public gesture datasets show that the proposed system is capable of achieving high mean accuracy and fast recognition speed. Its superiority is further demonstrated by comparison with other conventional techniques and two real-life applications.

Depth maps provide a totally new dimension on how machines sense the world besides texture, and make computers one step closer to human beings. Thus, integrating depth with other information is now an active area of research in computer vision and image processing. It has also become an important ingredient in various real world applications studied recently. This thesis focuses on two important topics in such a direction, namely 1) restoration of depth maps with improved quality, 2) incorporation of depth information into emerging applications for better performance.
Though there are various devices for depth map acquisition, Microsoft Kinect is widely used in many applications because of its lower cost and higher spatial resolution. However, it still suffers from problems such as missing data and noisy measurement. To tackle these problems, new registration and restoration algorithms are proposed based on the concept of joint color-depth consistency. Specifically, a new mutual information (MI) based matching method is first presented to reduce the registration errors between the color and depth cameras of Kinect. Then, two joint color-depth restoration algorithms, namely 1) surface normal based joint bilateral filter (JBF) and 2) superpixel-based restoration method, are proposed to inpaint the missing data in depth maps. Several novel inpainting and segmentation techniques, including joint color-depth probabilistic superpixel, probabilistic local polynomial regression (LPR) and joint color-depth matting, are also developed to improve restoration performance. Experimental results show that the proposed restoration algorithms effectively inpaint and refine the missing data of depth maps so as to achieve better color-depth consistency.
Utilizing the restored depth map, we further develop two real world applications, namely real-time image based rendering (IBR) and hand gesture recognition. Conventional IBR systems are usually based on off-line depth estimation and segmentation. With depth information it is possible to perform real-time IBR. However, depth uncertainty and disocclusion are two key problems to be addressed. Here, we propose a confidence-based rendering algorithm for reducing artifacts and a depth assisted dynamic background modelling scheme for inpainting. The proposed IBR system is implemented on graphic processing units (GPUs) to achieve nearly real-time performance. Experimental results show that our system can provide better visual quality of synthesized views, compared with conventional depth image-based rendering (DIBR) methods.
In our hand gesture recognition application, the depth map is utilized to isolate the user's hand through appropriate thresholding, while the hand location is obtained from the body skeleton estimated by Kinect. The hand shapes in the form of depth and color information are then jointly represented as superpixels, which effectively capture the shapes and color of the gestures to be recognized in a more compact form. Based on this representation, a novel distance metric, Superpixel Earth Mover's Distance (SP-EMD), is proposed to measure the dissimilarity between the hand gestures. Experimental results using both our and two other public gesture datasets show that the proposed system is capable of achieving high mean accuracy and fast recognition speed. Its superiority is further demonstrated by comparison with other conventional techniques and two real-life applications.

-

dc.language

eng

-

dc.publisher

The University of Hong Kong (Pokfulam, Hong Kong)

-

dc.relation.ispartof

HKU Theses Online (HKUTO)

-

dc.rights

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

-

dc.rights

The author retains all proprietary rights, (such as patent rights) and the right to use in future works.