Saturday, July 27, 2013

2.IntroductionSpeech Recognition System: Process of automatically recognizing who is speaker based on the unique characteristic contained in speech waves.Speaker recognition systems involve two phases : 1. Training 2. Testing Training is the process of familiarizing the system with the voice characteristics of the speakers registering. Testing is the actual recognition task.

8.MFCC Frame BlockingIn frame blocking, the continuous speech signal is blocked into frames of N samples, with adjacent frames being separated by M (M < N). The first frame consists of the first N samples. The second frame begins M samples after the first frame, and overlaps it by N - M samples. Similarly, the third frame begins 2M samples after the first frame (or M samples after the second frame) and overlaps it by N - 2M samples.

Typical values for N and M are N = 256 and M = 100.

9.WindowingThe next step in the processing is to window each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame. The concept here is to minimize the spectral distortion by using the window to taper the signal to zero at the beginning and end of each frame.

10.Fast Fourier TransformThe next processing step is the Fast Fourier Transform, which converts each frame of N samples from the time domain into the frequency domain. The FFT is a fast algorithm to implement the Discrete Fourier Transform(DFT) which is defined on the set of N samples {xn}, as follow: