2013

Statistical models of non-rigid deformable shape have wide application in many fields,
including computer vision, computer graphics, and biometry. We show that shape deformations
are well represented through nonlinear manifolds that are also matrix Lie groups.
These pattern-theoretic representations lead to several advantages over other alternatives,
including a principled measure of shape dissimilarity and a natural way to compose deformations.
Moreover, they enable building models using statistics on manifolds. Consequently,
such models are superior to those based on Euclidean representations. We
demonstrate this by modeling 2D and 3D human body shape. Shape deformations are
only one example of manifold-valued data. More generally, in many computer-vision and
machine-learning problems, nonlinear manifold representations arise naturally and provide
a powerful alternative to Euclidean representations. Statistics is traditionally concerned
with data in a Euclidean space, relying on the linear structure and the distances associated
with such a space; this renders it inappropriate for nonlinear spaces. Statistics can,
however, be generalized to nonlinear manifolds. Moreover, by respecting the underlying
geometry, the statistical models result in not only more effective analysis but also consistent
synthesis. We go beyond previous work on statistics on manifolds by showing how,
even on these curved spaces, problems related to modeling a class from scarce data can be
dealt with by leveraging information from related classes residing in different regions of the
space. We show the usefulness of our approach with 3D shape deformations. To summarize
our main contributions: 1) We define a new 2D articulated model -- more expressive than
traditional ones -- of deformable human shape that factors body-shape, pose, and camera
variations. Its high realism is obtained from training data generated from a detailed 3D
model. 2) We define a new manifold-based representation of 3D shape deformations that
yields statistical deformable-template models that are better than the current state-of-the-
art. 3) We generalize a transfer learning idea from Euclidean spaces to Riemannian
manifolds. This work demonstrates the value of modeling manifold-valued data and their
statistics explicitly on the manifold. Specifically, the methods here provide new tools for
shape analysis.

Visual 3D scene understanding is an important component in autonomous
driving and robot navigation. Intelligent vehicles for example often
base their decisions on observations obtained from video cameras
as they are cheap and easy to employ. Inner-city intersections represent
an interesting but also very challenging scenario in this context:
The road layout may be very complex and observations are often noisy
or even missing due to heavy occlusions. While Highway navigation
and autonomous driving on simple and annotated intersections have
already been demonstrated successfully, understanding and navigating
general inner-city crossings with little prior knowledge remains
an unsolved problem. This thesis is a contribution to understanding
multi-object traffic scenes from video sequences. All data is provided
by a camera system which is mounted on top of the autonomous driving
platform AnnieWAY. The proposed probabilistic generative model reasons
jointly about the 3D scene layout as well as the 3D location and
orientation of objects in the scene. In particular, the scene topology,
geometry as well as traffic activities are inferred from short video
sequences. The model takes advantage of monocular information in
the form of vehicle tracklets, vanishing lines and semantic labels.
Additionally, the benefit of stereo features such as 3D scene flow
and occupancy grids is investigated. Motivated by the impressive
driving capabilities of humans, no further information such as GPS,
lidar, radar or map knowledge is required. Experiments conducted
on 113 representative intersection sequences show that the developed
approach successfully infers the correct layout in a variety of difficult
scenarios. To evaluate the importance of each feature cue, experiments
with different feature combinations are conducted. Additionally,
the proposed method is shown to improve object detection and object
orientation estimation performance.

Pneumoconiosis is an occupational lung disease caused by the inhalation of industrial dust. Despite the increasing safety measures and better work place environments, pneumoconiosis is deemed to be the most common occupational disease in the developing countries like India and China. Screening and assessment of this disease is done through radiological observation of chest x-rays. Several studies have shown the significant inter and intra reader observer variation in the diagnosis of this disease, showing the complexity of the task and importance of the expertise in diagnosis.
The present study is aimed at understanding the perceptual and cognitive factors affecting the reading of chest x-rays of pneumoconiosis patients. Understanding these factors helps in developing better image acquisition systems, better training regimen for radiologists and development of better computer aided diagnostic (CAD) systems. We used an eye tracking experiment to study the various factors affecting the assessment of this diffused lung disease. Specifically, we aimed at understanding the role of expertize, contralateral symmetric (CS) information present in chest x-rays on the diagnosis and the eye movements of the observers. We also studied the inter and intra observer fixation consistency along with the role of anatomical and bottom up saliency features in attracting the gaze of observers of different expertize levels, to get better insights into the effect of bottom up and top down visual saliency on the eye movements of observers.
The experiment is conducted in a room dedicated to eye tracking experiments. Participants consisting of novices (3), medical students (12), residents (4) and staff radiologists (4) were presented with good quality PA chest X-rays, and were asked to give profusion ratings for each of the 6 lung zones. Image set consisting of 17 normal full chest x-rays and 16 single lung images are shown to the participants in random order. Time of the diagnosis and the eye movements are also recorded using a remote head free eye tracker.
Results indicated that Expertise and CS play important roles in the diagnosis of pneumoconiosis. Novices and medical students are slow and inefficient whereas, residents and staff are quick and efficient. A key finding of our study is that the presence of CS information alone does not help improve diagnosis as much as learning how to use the information. This learning appears to be gained from focused training and years of experience. Hence, good training for radiologists and careful observation of each lung zone may improve the quality of diagnostic results. For residents, the eye scanning strategies play an important role in using the CS information present in chest radiographs; however, in staff radiologists, peripheral vision or higher-level cognitive processes seems to play role in using the CS information.
There is a reasonably good inter and intra observer fixation consistency suggesting the use of similar viewing strategies. Experience is helping the observers to develop new visual strategies based on the image content so that they can quickly and efficiently assess the disease level. First few fixations seem to be playing an important role in choosing the visual strategy, appropriate for the given image.
Both inter-rib and rib regions are given equal importance by the observers. Despite reading of chest x-rays being highly task dependent, bottom up saliency is shown to have played an important role in attracting the fixations of the observers. This role of bottom up saliency seems to be more in lower expertize groups compared to that of higher expertize groups. Both bottom up and top down influence of visual fixations seems to change with time. The relative role of top down and bottom up influences of visual attention is still not completely understood and it remains the part of future work.
Based on our experimental results, we have developed an extended saliency model by combining the bottom up saliency and the saliency of lung regions in a chest x-ray. This new saliency model performed significantly better than bottom-up saliency in predicting the gaze of the observers in our experiment. Even though, the model is a simple combination of bottom-up saliency maps and segmented lung masks, this demonstrates that even basic models using simple image features can predict the fixations of the observers to a good accuracy.
Experimental analysis suggested that the factors affecting the reading of chest x-rays of pneumoconiosis are complex and varied. A good understanding of these factors definitely helps in the development of better radiological screening of pneumoconiosis through improved training and also through the use of improved CAD tools. The presented work is an attempt to get insights into what these factors are and how they modify the behavior of the observers.

An Analysis of Successful Approaches to Human Pose Estimation, University of Augsburg, University of Augsburg, May 2012 (mastersthesis)

Abstract

The field of Human Pose Estimation is developing fast and lately leaped forward
with the release of the Kinect system. That system reaches a very good perfor-
mance for pose estimation using 3D scene information, however pose estimation
from 2D color images is not solved reliably yet. There is a vast amount of pub-
lications trying to reach this aim, but no compilation of important methods and
solution strategies. The aim of this thesis is to fill this gap: it gives an introductory
overview over important techniques by analyzing four current (2012) publications
in detail. They are chosen such, that during their analysis many frequently used
techniques for Human Pose Estimation can be explained. The thesis includes two
introductory chapters with a definition of Human Pose Estimation and exploration
of the main difficulties, as well as a detailed explanation of frequently used methods.
A final chapter presents some ideas on how parts of the analyzed approaches can
be recombined and shows some open questions that can be tackled in future work.
The thesis is therefore a good entry point to the field of Human Pose Estimation
and enables the reader to get an impression of the current state-of-the-art.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems