Speech Enhancement Using Deep Learning

Identification and clearness of main voice within noised signals is a major problem in the signal processing field. Several methods of speech enhancement based on first order statistics within the spectral domain has been applied on noised signals to isolate and clear the main voice. Unfortunately, all these methods failed to efficiently clean noised signals, especially when dealing with complexed ones. Recently, several research groups suggested that speech enhancement of noised voice should be applied using deep learning via time domain. In his article, Santiago Pascual used deep CNN (convolutional neural network) constructed GAN (Generative Adversarial Networks) model on small set of noised voices, a method that resulted in enhanced speech recognition and clearness. Here, we aim to (a) improve Santiago Pascual work by using WGAN (Wasserstien Generative Adversarial Networks) model, that may be more stable for training than GAN and (b) explore the trade of between L1 loss and WGAN loss. Interestingly, based on the data set that we used, we found that (a) WGAN does not contribute in CNN architecture for speech enhancement and (b) the use of L1 alone gives satisfying results.