A fast discriminative training approach for speaker verification based on i-vectors has been presented. On NIST telephone evaluation data, the resulting models perform better, without the need for normalization techniques, than the generative ones, even compared with heavy-tailed models.

Abstract

This work presents a new approach to discriminative speaker verification. Rather than estimating speaker models, or a model that discriminates between a speaker class and the class of all the other speakers, we directly solve the problem of classifying pairs of utterances as belonging to the same speaker or not. The paper illustrates the development of a suitable Support Vector Machine kernel from a state-of-the-art generative formulation, and proposes an efficient approach to train discriminative models. The results of the experiments performed on the tel-tel extended core condition of the NIST 2010 Speaker Recognition Evaluation are competitive or better, in terms of normalized Decision Cost Function and Equal Error Rate, compared to the more expensive generative models.