3 Speaker IdentificationDetermines the speaker from a set of registered speakersThis is called a “closed” set identificationResult is the best speaker matchedWhat if the speaker is not in the database?This is called an “open” set identificationResult can be a speaker or a no-match resultMarch 16, 2009Scott Settembre

6 Scott Settembre [ss424@cse.buffalo.edu]Speaker ValidationAlso called “Verification” or “Authentication”Determines if the voice matches a particular registered speakerResult is the probability of a match or a similarity measureSimilarity must exceed a particular thresholdHigher threshold produces more false negativesLower threshold produces more false positivesVoice variability and security issues make this a difficult threshold value to determine (more later)March 16, 2009Scott Settembre

9 Scott Settembre [ss424@cse.buffalo.edu]Recognition MethodsText DependentRequires user to speak text spoken at enrollmentUsually a name, password, or phraseText Prompting is used to combat deceptionThe system requires the user to repeat back a random phrase or list of numbersVideo example from “CSAIL” - Spoken Language Systems group at MIT.March 16, 2009Scott Settembre

16 Step 2. Normalize Captured SpeechIntersession variability and variability over time cause speech features to fluctuateUse of “filter bank” is commonNormalization helps remove these variations, but at a priceParameter-Domain normalizationDistance/Similarity-Domain normalizationMarch 16, 2009Scott Settembre

17 Step 2.a. Normalization TechniquesParameter-Domain normalizationSpectral equalization (i.e. signal processing)Dampens large variations in features by averaging over time, useful for long utterancesRemoves some speaker specific featuresDistance/Similarity-Domain normalizationVarious techniques that use probabilities of known speakers that have already been enrolledUseful if you are doing validationMarch 16, 2009Scott Settembre

18 Step 3. Feature ExtractionThe input utterance is converted to a set of feature vectorsTime alignment may need to be doneCalculate similarity between each captured vector with the registered speaker template or modelHellohheeellloohheeellloohheeellloohh.90 similarityhehe.60 similarity, .75 overallMarch 16, 2009Scott Settembre