PhD Seminar

Signal entities such as magnitude, time and, frequency are often employed for extracting important time-events. However, the relative change in phase-based representations has not been studied well. We analyze the relevance of a well-known phase-based feature, known as group delay, for extracting various information from signals. We further discuss the benefit of a task-dependent feature learning framework using neural networks as an alternative to existing methods for information extraction. This talk discusses time-event detection (TED) tasks from speech, music, and neuronal signals and source separation/signal extraction from audio signals.

We first analyze the property of group delay (GD) function relevant for time-event detection tasks. This includes analyses of the high-resolution property of GD function for single pole and multi-pole systems. The observation that two closely-spaced pole locations are better resolved in group delay domain compared to spectral magnitude domain is discussed. This finding is then corroborated by TED tasks from speech, music and neuronal signals. We consider pitch estimation from speech signals and percussive onset detection from musical signals. The applicability of GD function beyond audio domain is then exploited by considering spike estimation task. The temporal positions of spikes are estimated from the calcium fluorescence signal by representing the input in the GD domain. Use of GD as a feature for efficiently learning time-frequency mask specific to the target source is then discussed for singing voice separation task and for separating the melodic components in the musical mixture. The separation framework used for this task is based on recurrent neural network (RNN). This network is also employed as a percussive separation stage for onset detection from musical mixtures and as a speech enhancement stage for automatic gender recognition (AGR) under noisy conditions. Finally, the feasibility of end-to-end learning for TED and signal extraction is presented with a focus on spike estimation task. This framework has the potential to be a powerful alternative to neural network models which uses hand engineered features.