Speaker Verification Using Deep Neural Networks: A Review

Amna Irum and Ahmad Salman

Abstract—Speaker verification involves examining the
speech signal to authenticate the claim of a speaker as true or
false. Deep neural networks are one of the successful
implementation of complex non-linear models to learn unique
and invariant features of data. They have been employed in
speech recognition tasks and have shown their potential to be
used for speaker recognition also. In this study, we investigate
and review Deep Neural Network (DNN) techniques used in
speaker verification systems. DNN are used from extracting
features to complete end-to-end system for speaker verification.
They are generally used to extract speaker-specific
representations, for which the network is trained using speaker
data in training phase. Speaker representation depends on the
type of the model, the representation level, and the model
training loss. Usually deep learning is crux of attention in
computer vision community for various tasks and we believe
that a comprehensive review of current state-of-the-art in deep
learning for speaker verification summarize the utilization of
these approaches for readers in speech processing community.

The authors are with School of Electrical Engineering and Computer
Sciences (SEECS), National University of Sciences and Technology
(NUST), Islamabad, Pakistan (e-mail: amna.irum@seecs.edu.pk,
ahmad.salman@seecs.edu.pk).