Text-independent speaker verification (SV) is currently in the
process of embracing DNN modeling in every stage of SV system.
Slowly, the DNN-based approaches such as end-to-end
modelling and systems based on DNN embeddings start to be
competitive even in challenging and diverse channel conditions
of recent NIST SREs. Domain adaptation and the need for a large amount of training data are still a challenge for current
discriminative systems and (unlike with generative models), we
see significant gains from data augmentation, simulation and
other techniques designed to overcome lack of training data.
We present an analysis of a SV system based on DNN embeddings
(x-vectors) and focus on robustness across diverse data
domains such as standard telephone and microphone conversations,
both in clean, noisy and reverberant environments. We
also evaluate the system on challenging far-field data created
by re-transmitting a subset of NIST SRE 2008 and 2010 microphone
interviews. We compare our results with the stateof-
the-art i-vector system. In general, we were able to achieve
better performance with the DNN-based systems, but most importantly,
we have confirmed the robustness of such systems
across multiple data domains.