View/Open

Permanent link

Metadata

Abstract

Video and audio information is spatio- or spectro-(sound/frequency) temporal in nature and processing of such complex Spatio/Spectro Temporal Data (SSTD) is a challenging task in the machine learning domain. SSTD contains both the spatial (space) and temporal (time) components and most often both these two components are highly correlated.
Due to the existence of high correlations between these two components, it is essential to process them together. However, many of the existing computational methods either process spatial and temporal components separately, or processing them together then the significant correlation information present in the SSTD is not considered. Comparatively, the brain is capable of performing such tasks in a fast and robust manner. Inspired by the innate cognitive functions of our brain, the proposed study investigates how various biological and cognitive aspects such as learning, evolution and neural information processing tasks can be applied to our computational model. We have shown that this enables efficient data acquisition, processing and learning of complex video and audio patterns thereby resulting in improved classification performance.
This thesis proposes novel frameworks and classification methods employing a class of evolving spiking neural networks (eSNN) called dynamic evolving spiking neural networks (deSNN) along with reservoir computing. In our study, we have shown that using the proposed frameworks results in (1) better classification performance when compared to standalone spiking neural network classifiers such as eSNN, (2) better understanding of the data and the problem being solved, (3) faster SSTD processing due to the online one-pass spike-based computational approaches.
All the frameworks and methods proposed in this thesis have been evaluated on synthetic and real world problems. In order to evaluate the efficacy of the new methodology, initially a pilot experiment has been performed as a benchmark test using a synthetic video dataset, followed by experiments on real world problems relating to motion and sound such as human action recognition and heart sound recognition.