LSTM for Uniform Credit Assignment to Deep Networks

Term: 10/2016 - 8/2019 (36 months)

Abstract:
Recently, "Long Short-Term Memory" (LSTM) networks have emerged as the best-performing technique in speech and language processing. Recent conferences in these fields have been dominated by LSTM-based approaches, e.g. the flagship conference ICASSP. Recent benchmark records were achieved with LSTM, often by major IT companies like Google, IBM, Microsoft, and Baidu.

The success of LSTM networks stems from its memory cells which avoid vanishing gradients. The key advantage of LSTM in speech and language processing is not necessarily the extraction of long- term dependencies, but rather its capability to perform "uniform credit assignment" to inputs, that is, the attribution of similar error signals to all input signals. Hence, LSTM allows for treating all inputs on the same level. For example, when a sentence is processed, the first word may be as important for learning as the last word. Uniform credit assignment considers all input information equally no matter where it is located in the input sequence. If learning is biased to the most recent inputs, sub-optimal solutions are often obtained.

In this project, we want to go beyond uniform credit assignment to simple inputs like words. We aim at using LSTM networks for uniform credit assignment to deep networks which process complex inputs, such as, images, speech, or chemical compounds. Such networks can be applied to the classification of actions in videos, where single frames may not convey sufficient information. That may also include photo series that show the same object from different angles, with the aim to extract features that are not visible on single images. High-content imaging of cells in drug design is another application in which a high-resolution image is split into multiple sub-images that are presented sequentially to the classification system. A further application is to predict the toxicity of a mixture of chemical compounds (e.g. a soil sample), where an unknown number of chemical structures are presented sequentially to the network.

The new architectures and new approaches to LSTM-based uniform credit assignment to deep networks are benchmarked and tested on following data sets:
(i) video activity recognition and video description;
(ii) classification of large images which are split into sub-images;
(iii) classification of mixtures of compounds with unknown number of components, where the components are sequentially presented to the LSTM network.