The emergence of multi-view capture systems has yield a tremendous amount of video sequences. The task of capturing spatio-temporal models from real world imagery (4D modeling) should arguably benefit from this enormous visual information. In order to achieve highly realistic representations both geometry and appearance need to be modeled in high precision. Yet, even with the great progress of the geometric modeling, the appearance aspect has not been fully explored and visual quality can still be improved.
I will explain how we can optimally exploit the redundant visual information of the captured video sequences and provide a temporally coherent, super-resolved, view-independent appearance representation. I will further discuss how to exploit the interdependency of both geometry and appearance as separate modalities to enhance visual perception and finally how to decompose appearance representations into intrinsic components (shading & albedo) and super-resolve them jointly to allow for more realistic renderings.

Variational image processing translates image processing tasks into optimisation problems.
The practical success of this approach depends on the type of optimisation problem and on the properties of the ensuing algorithm. A recent breakthrough was to realise that old first-order optimisation algorithms based on operator splitting are particularly suited for modern data analysis problems. Operator splitting techniques decouple complex optimisation problems into many smaller and simpler sub-problems.
In this talk I will revise the variational segmentation problem and a common family of algorithms to solve such optimisation problems. I will show that operator splitting leads to a divide-and-conquer strategy that allows to derive simple and massively parallel updates suitable for GPU implementations. The technique decouples the likelihood from the prior term and allows to use a data-driven model estimating the likelihood from data, using for example deep learning. Using a different decoupling strategy together with general consensus optimisation leads to fully distributed algorithms especially suitable for large-scale segmentation problems. Motivating applications are 3d yeast-cell reconstruction and segmentation of histology data.

In my talk I will present my work regarding 3D mapping using lidar scanners. I will give an overview of the SLAM problem and its main challenges: robustness, accuracy and processing speed. Regarding robustness and accuracy, we investigate a better point cloud representation based on resampling and surface reconstruction. Moreover, we demonstrate how it can be incorporated in an ICP-based scan matching technique. Finally, we elaborate on globally consistent mapping using loop closures. Regarding processing speed, we propose the integration of our scan matching in a multi-resolution scheme and a GPU-accelerated implementation using our programming language Quasar.

In this talk we will address the problem of 3D reconstruction of rigid and deformable objects from a single depth video stream. Traditional 3D registration techniques, such as ICP and its variants, are wide-spread and effective, but sensitive to initialization and noise due to the underlying correspondence estimation procedure. Therefore, we have developed SDF-2-SDF, a dense, correspondence-free method which aligns a pair of implicit representations of scene geometry, e.g. signed distance fields, by minimizing their direct voxel-wise difference. In its rigid variant, we apply it for static object reconstruction via real-time frame-to-frame camera tracking and posterior multiview pose optimization, achieving higher accuracy and a wider convergence basin than ICP variants. Its extension to scene reconstruction, SDF-TAR, carries out the implicit-to-implicit registration over several limited-extent volumes anchored in the scene and runs simultaneous GPU tracking and CPU refinement, with a lower memory footprint than other SLAM systems. Finally, to handle non-rigidly moving objects, we incorporate the SDF-2-SDF energy in a variational framework, regularized by a damped approximately Killing vector field. The resulting system, KillingFusion, is able to reconstruct objects undergoing topological changes and fast inter-frame motion in near-real time.

Recently, deep learning proved to be successful also on low level vision tasks such as stereo matching. Another recent trend in this latter field is represented by confidence measures, with increasing effectiveness when coupled with random forest classifiers or CNNs. Despite their excellent accuracy in outliers detection, few other applications rely on them.
In the first part of the talk, we'll take a look at the latest proposal in terms of confidence measures for stereo matching, as well as at some novel methodologies exploiting these very accurate cues.
In the second part, we'll talk about GC-net, a deep network currently representing the state-of-the-art on the KITTI datasets, and its extension to motion stereo processing.

In the recent years, commodity 3D sensors have become easily and widely available. These advances in sensing technology have spawned significant interest in using captured 3D data for mapping and semantic understanding of 3D environments. In this talk, I will give an overview of our latest research in the context of 3D reconstruction of indoor environments. I will further talk about the use of 3D data in the context of modern machine learning techniques. Specifically, I will highlight the importance of training data, and how can we efficiently obtain labeled and self-supervised ground truth training datasets from captured 3D content. Finally, I will show a selection of state-of-the-art deep learning approaches, including discriminative semantic labeling of 3D scenes and generative reconstruction techniques.

Our world is dynamic and three-dimensional. Understanding the 3D layout of scenes and the motion of objects is crucial for successfully operating in such an environment. I will talk about two lines of recent research in this direction. One is on end-to-end learning of motion and 3D structure: optical flow estimation, binocular and monocular stereo, direct generation of large volumes with convolutional networks. The other is on sensorimotor control in immersive three-dimensional environments, learned from experience or from demonstration.

I'll present my master thesis "Biquadratic Forms and Semi-Definite Relaxations".
It is about biquadratic optimization programs (which are NP-hard generally) and examines a condition under which there exists an algorithm that finds a solution to every instance of the problem in polynomial time. I'll present a counterexample for which this is not possible generally and face the question of what happens if further knowledge about the variables over which we optimise is applied.

In this talk I am going to present the work we have been doing at the Computer Vision Lab of the Technical University of Munich which started as an attempt to better deal with videos (and therefore the time domain) within neural network architectures.

In this talk I will present the portfolio of work we conduct in our lab. Herby, I will present three recent body of work in more detail. This is firstly our work on learning 6D Object Pose estimation and Camera localizing from RGB or RGBD images. I will show that by utilizing the concepts of uncertainty and learning to score hypothesis, we can improve the state of the art. Secondly, I will present a new approach for inferring multiple diverse labeling in a graphical model. Besides guarantees of an exact solution, our method is also faster than existing techniques. Finally, I will present a recent work in which we show that popular Auto-context Decision Forests can be mapped to Deep ConvNets for Semantic Segmentation. We use this to detect the spine of a zebrafish, in case when little training data is available.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems