This article is about deep neural networks and hidden Markov models in i-vector-based text-dependent speaker verification.

Abstract

Techniques making use of Deep Neural Networks (DNN)
have recently been seen to bring large improvements in textindependent
speaker recognition. In this paper, we verify that
the DNN based methods result in excellent performances in
the context of text-dependent speaker verification as well. We
build our system on the previously introduced HMM based ivector
approach, where phone models are used to obtain frame
level alignment in order to collect sufficient statistics for ivector
extraction. For comparison, we experiment with an alternative
alignment obtained directly from the output of DNN
trained for phone classification. We also experiment with DNN
based bottleneck features and their combinations with standard
cepstral features. Although the i-vector approach is generally
considered not suitable for text-dependent speaker verification,
we show that our HMM based approach combined with bottleneck
features provides truly state-of-the-art performance on
RSR2015 data.