Beschreibung

In the last decade digital libraries and web video portals have become more and more popular. The amount of video data available on the World Wide Web (WWW) is growing rapidly. According to the official statistic-report of the popular video portal YouTube more than 400 hours of video are uploaded every minute. Therefore, how to efficiently retrieve video data on the web or within large video archives has become a very important and challenging task.

In our current research we focus on video analysis and multimedia information retrieval (MIR) by using Deep-Learning techniques. Deep Learning (DL), as a new area of machine learning (since 2006), has already been impacting a wide range of multimedia information processing. Recently, the techniques developed based on DL achieved substantial progress in fields including Computer Vision, Speech Recognition, Image Classification and NLP etc.

Topics in this seminar:

Human identity verification using deep facial representation In modern face recognition, the conventional pipeline consists of four stages: face detection -> frontal face alignment -> facial representation -> classification. Convolutional Neural Networks (CNNs) have taken the computer vision community by significantly improving the state-of-the-art in many applications. In this project, we will work on developing a solution for face verification based on Deep Facial Model. The existing frontal face alignment methods should be studied and an efficient implementation is expected.

Indoor human activities recognition The number of surveillance cameras, importance of video analytics, storage time for surveillance data and strategic value of video surveillance are increasing significantly. Indoor human activities recognition is also one important part of event detection in surveillance videos. LIRIS provides a typical human activities recognition dataset which contains (gray/rgb/depth) videos showing people performing various activities taken from daily life (discussing, telephone calls, giving an item etc.)

German word vector generation and potential applications "Word Vector" is a kind of distributed representation of words, which is deriving from Deep Learning techniques and popular in various Natural Language Processing (NLP) applications recently. By far, the majority of success in word vectors are based on English. In this seminar topic, we aim to train our own German word vectors, evaluate their quality and attempt to apply them in real applications.

DRAW: Deep network for image generation Deep Learning approaches are "data driven" machine learning approaches that need huge amounts of data in order to successfully be trained on a specific task. It is said that a deep neural network needs at least 1000 labelled samples per class to achieve acceptable performance and around 1 million labelled samples to outperform humans on the task in question. Getting hold of enough data for training is a very challenging problem as it is not feasible to manually label millions of real world samples. A solution is to use artificially generated samples that are indistinguishable from real world samples. We want to have a look at the so called "DRAW" network that is capable of generating samples containing text. We want implement this architecture and evaluate whether it is possible to use this architecture to generate labelled examples that can be used for the task of scene text recognition.