Bild- und Videosignal-Inhaltsanalyse

Convolutional neural networks generate feature vectors which can be easily binarized using a sigmoid hashing function. The problem with this saturating function is that the gradient vanishes when the sigmoid function saturates. In a deep neural network, the error signal from the final layer will completely vanish when propagated back during the backpropgation step. An approach to avoid this problem is by using a sigmoid with adaptive slope. In this thesis the student have to mainly explore different approaches for solving this problem and also test the effectiveness of using a sigmoid with variable slope. Basic knowledge of deep learning is beneficial.

The goal of this research is to generate a 3D model (solid 3D model or a dense point cloud) and then render a virtual view from a set of captured photos. The rendered virtual views then can be used as an additional reference for motion compensation in video coding. Assume we have a moving camera capturing images from different viewpoints. These images are used as inputs for 3D reconstruction, camera parameters estimation and generating a sparse/dense point cloud of the captured scene. This way, the 2D visual information is converted to its equivalent 3D data (2D → 3D). This 3D information will be employed to predict the missing/future frames (by projecting 3D to 2D if the camera poses are known), synthesizing the novel views haven’t seen by the camera (Virtual/Augmented Reality and Free Viewpoint TV) and Localization/Mapping for robotics and self-driving vehicles. The figures below show the concept of virtual view synthesis and a dense point cloud generated from a video sequence captured by a moving car. In this research, we are focusing on predicting missing/future frames.

There are many challenging topics in this area. Some of them are listed below:

(1) Camera Calibration and Point Cloud Generation using SfM/SLAM. First, the intrinsic and extrinsic camera parameters should be estimated from a video sequence captured by a moving monocular camera, then these parameters should be used to estimate a semi-dense point cloud of the captured scene. The main approaches to solving this problem are Structure from Motion (SfM) and Simultaneously Localization and Mapping (SLAM).

(2) Image-Based Rendering using Point Cloud. A dense point cloud, camera poses and already known images (real cameras) are given and a novel view should be synthesized.

(3) Point Cloud and 3D Mesh Reconstruction for Video Coding. The aim is to generate a dense point cloud and then convert it to a 3D mesh. In this research, the limitation of video coding pipeline (e.g. hierarchical coding structure) should be considered.

(4) Low Complexity 3D Model-based Motion Compensation for Video Coding. Computational complexity is one of the major issues in 3D reconstruction. In this research, the computational complexity of the point cloud-based virtual view synthesis will be studied and solutions to reduce it will be investigated (solutions like using sparse point cloud, coarser mesh, ...).

(5) A Statistical Analysis of 3D Model-based Video Coding. This topic is more related to video coding and focuses on analyzing the contribution of the synthesized 3D model-based prediction in motion compensation.