Abstract

Visual information along with audio is important for human machine interface. It not only increases the accuracy of an Audio Speech Recognition (ASR) but also improves its robustness. This paper presents an overview of different approaches used for viseme recognition and also reports the new results for Hindi viseme recognition. The visemes were extracted from a database prepared from continuous sentences uttered by 5 native Hindi speakers. For audio features mel frequency cepstral coefficients (MFCCs) were used while discrete wavelet transform (DWT) followed by discrete cosine transform (DCT) was used for visual feature extraction. The features extracted were then given to discriminant function based classifier. The maximum improvement in the recognition performance of 10.72 % is achieved at -5 dB signals to noise ratio (SNR).

References

No relevant information is available
If you register references through the customer center, the reference information will be registered as soon as possible.