Over recent years, i-vector representation of speech has been widely used by state-of-the-art speaker recognition systems. This representation maps arbitrary duration speech segments into a fixed and low dimensional vector. The main challenge of i-vector representation is the variability associated with different i-vectors of the same speaker. This variability is mainly due to the use of different handsets, environmental noise, speaker health and emotion or segment duration. Therefore, inter-session variability compensation techniques have been developed to directly remove unwanted variability from i-vectors. We have proposed a method which uses the idea of one of the popular robust beamforming techniques named Linearly Constrained Minimum Variance (LCMV), which has been presented in the context of beamforming for signal enhancement.

We will show that LCMV can improve performance on NIST 2014 speaker recognition challenge by building a model out of different i-vectors of a given speaker so as to cancel inter-session variability and increase inter-speaker variability.

The amount of available audio data, such as broadcast news archives, radio recordings, music and songs collections, podcasts or various internet media is constantly increasing. Many audio indexing techniques are developed, working only for a specific audio content (music, commercials, jingles, speech, laughter, etc…).

In this work we report our recent efforts in extending the ALISP (Automatic Language Independent Speech Processing) approach developed for speech as a generic method for audio indexing, retrieval and recognition. The proposed system consists of three steps. First, an unsupervised training is performed in order to model and acquire the ALISP HMM models. Then, the acquired models are used to transform the audio data into a sequence of symbols. Finally, a comparison and decision module inspired form the Basic Local Alignment Search (BLAST) tool is exploited to search for a sequence of ALISP symbols of unknown audio data in the reference database that contains different audio items.

The evaluations of the proposed systems are done on the YACAST and other publicly available corpora for several tasks of audio indexing. The experimental results show an excellent performance in audio identification (for advertisement and songs), audio motif discovery (for advertisement and songs), speaker diarization and laughter detection.