To link to the entire object, paste this link in email, IM or documentTo embed the entire object, paste this HTML in websiteTo link to this page, paste this link in email, IM or documentTo embed this page, paste this HTML in website

Depth Inference and Visual Saliency Detection from 2D Images
by
Jingwei Wang
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2013
Copyright 2013 Jingwei Wang

With the rapid development of 3D vision technology, it is an active research topic to recover the depth information from 2D images. Current solutions heavily depend on the structure assumption of the 2D image and their applications are limited. It is now still technically challenging to develop an efficient yet general solution to generate the depth map from a single image. Furthermore, psychological study indicates that human eyes are particular sensitive to salient object region within one image. Thus, it is critical to detect salient object accurately, and segment its boundary very well as small depth error in these areas will lead to intolerant visual distortion. Briefly speaking, research works in this literature can be categorized into two different categories. Depth map inference system design and salient object detection and segmentation algorithm development. ❧ For depth map inference system design, we propose a novel depth inference system for 2D images and videos. Specifically, we first adopt the in-focus region detection and salient map computation techniques to separate the foreground objects from the remaining background region. After that, a color-based grab-cut algorithm is used to remove the background from obtained foreground objects by modeling the background. As a result, the depth map of the background can be generated by a modified vanishing point detection method. Then, key frame depth maps can be propagated to the remaining frames. Finally, to meet the stringent requirements of VLSI chip implementation such as limited on-chip memory size and real-time processing, we modify some building modules with simplified versions of the in-focus region detection and the mean-shift algorithm. Experimental result shows that the proposed solution can provide accurate depth maps for 83% of images while other state-of-the-art methods can only achieve accuracy for 34% of these test images. This simplified solution targeting at the VLSI chip implementation has been validated for its high accuracy as well as high efficiency on several test video clips. ❧ For salient object detection, inspired by success of late fusion in semantic analysis and multi-modal biometrics, we model saliency detection as late fusion at confidence score level. In fact, we proposed to fuse state-of-the-arts saliency models at score level in a para-boosting learning fashion. Firstly, saliency maps generated from these models are used as confidence scores. Then, these scores are fed into our para-boosting learner (i.e. Support Vector Machine (SVM), Adaptive Boosting (AdBoost), or Probability Density Estimator (PDE)) to predict the final saliency map. In order to explore strength of para-boosting learners, traditional transformation based fusion strategies such as Sum, Min, Max are also also applied for comparison purpose. In our application scenario, salient object segmentation is our final goal. So, we further propose a novel salient object segmentation schema using Conditional Random Field(CRF) graph model. In this segmentation model, we first extract local low-level features, such as output maps of several saliency models, gradient histogram and position of each image pixel. We then train a random forest classifier to fuse saliency maps into a single high-level feature map using ground-truth annotations. Finally, Both low- and high-level features are fed into our CRF and parameters are learned. The segmentation results are evaluated from two different perspectives: region and contour accuracy. Extensive experimental comparison shows that both our salient object detection and segmentation model outperforms the ground truth labeled by human eyes.state-of-the-art saliency models and are, so far, the closest to human eyes' performance.

The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.

Depth Inference and Visual Saliency Detection from 2D Images
by
Jingwei Wang
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2013
Copyright 2013 Jingwei Wang