Tools

"... In the speech technology research community there is an increasing trend to use open source solutions. We present a new tool in that spirit, WaveSurfer, which has been developed at the Centre for Speech Technology at KTH. It has been designed for tasks such as viewing, editing, and labeling of audio ..."

In the speech technology research community there is an increasing trend to use open source solutions. We present a new tool in that spirit, WaveSurfer, which has been developed at the Centre for Speech Technology at KTH. It has been designed for tasks such as viewing, editing, and labeling of audio data. WaveSurfer is built around a small core to which most functionality is added in the form of plug-ins. The tool has been designed to work on most common platforms and with the aims that it should be easy to configure and extend. WaveSurfer is provided as open source, under the GPL license with the explicit goal that the speech community jointly will improve and expand its scope and capabilities.

Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.

"... Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on area ..."

Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a “researchready” system, Sphinx-4 also includes several implementations of both simple and state-of-the-art techniques. The framework and the implementations are all freely available via open source.

"... The prominent modeling technique for speech recognition today is the hidden Markov model with Gaussian emission densities. However, they suffer from an inability to learn discriminative information. Artificial neural networks have been proposed as a replacement for the Gaussian emission probabilitie ..."

The prominent modeling technique for speech recognition today is the hidden Markov model with Gaussian emission densities. However, they suffer from an inability to learn discriminative information. Artificial neural networks have been proposed as a replacement for the Gaussian emission probabilities under the belief that the ANN models provide better discrimination capabilities. However, the use of ANNs often results in over-parameterized models which are prone to overfitting. Techniques such as cross-validation have been suggested as remedies to the overfitting problem but employing these is wasteful of both resources and computation. Further, cross-validation does not address the issue of model structure and over-parameterization. Recent work on machine learning has moved toward automatic methods for controlling generalization and parameterization. A model that has gained much popularity recently is the support vector machine (SVM). SVMs use the principle of structural risk

"... Developing an accessible curriculum of laboratory courses for undergraduate students is vital to progress in human language technology. In this article, we describe means to provide students with access to leading-edge language technologies, and tools to combine these technologies in spoken dialogue ..."

Developing an accessible curriculum of laboratory courses for undergraduate students is vital to progress in human language technology. In this article, we describe means to provide students with access to leading-edge language technologies, and tools to combine these technologies in spoken dialogue systems of their own design. The tools and technologies used in the proposed laboratory courses will enable students to build interactive dialogue systems for new and exciting applications and to research and perhaps improve the core language technologies. By making them freely available (via the Internet or CD-ROM) with documentation and support, our community can remove some of the main entry barriers to developing new programs in human language technology in our colleges and universities. In this way, students are not only exposed to new technology, they become involved in the process of creating it. 1.

"... Modern real-time applications with increasing design complexity have revolutionized the embedded design procedure. Energy budget constraints and shortening time to market have led designers to consider cooperative design of hardware and software modules for a given embedded application. In hardware- ..."

Modern real-time applications with increasing design complexity have revolutionized the embedded design procedure. Energy budget constraints and shortening time to market have led designers to consider cooperative design of hardware and software modules for a given embedded application. In hardware-software codesign the trade offs in both the domains are carefully analyzed and the processor intensive tasks are off-loaded to the hardware to meet the performance criteria while the rest is implemented in software to provide the required features and flexibility. Speech recognition systems used in real time applications involve complex algorithms for faithful recognition. The nature of these tasks restricts the implementation to large platforms and is not feasible to meet the performance constraints for smaller embedded mobile systems and battery operated devices. This thesis proposes an idea for hardware-software codesign of a Hidden Markov Model (HMM) based large vocabulary continuous speech recognition system. The entire procedure can be divided into three phases: the initial phase deals

"... Rapid advances in speech recognition theory, as well as computing hardware, have led to the development of machines that can take human speech as input, decode the information content of the speech, and respond accordingly. Real-time performance of such systems is often dominated by the evaluation o ..."

Rapid advances in speech recognition theory, as well as computing hardware, have led to the development of machines that can take human speech as input, decode the information content of the speech, and respond accordingly. Real-time performance of such systems is often dominated by the evaluation of likelihoods in the statistical modeling component of the system. Statistical models are typically implemented using Gaussian mixture distributions. The primary objective of this thesis was to develop an extension of the Bucket Box Intersection algorithm in which the dimension with the optimal number of splits can be selected when multiple minima are present. The effects of normalization of mixture weights and Gaussian clipping have also been investigated. We show that the Extended BBI algorithm (EBBI) reduces run-time by 21 % without introducing any approximation error. EBBI also produced a 12 % lower word error rate than Gaussian clipping for the same computational complexity. These approaches were evaluated on a wide variety of tasks including conversational speech. DEDICATION

"... The prominent modeling technique for speech recognition today is the hidden Markov model with Gaussian emission densities. However, they suffer from an inability to learn discriminative information. Artificial neural networks have been proposed as a replacement for the Gaussian emission probabilitie ..."

The prominent modeling technique for speech recognition today is the hidden Markov model with Gaussian emission densities. However, they suffer from an inability to learn discriminative information. Artificial neural networks have been proposed as a replacement for the Gaussian emission probabilities under the belief that the ANN models provide better discrimination capabilities. However, the use of ANNs often results in over-parameterized models which are prone to overfitting. Techniques such as cross-validation have been suggested as remedies to the overfitting problem but employing these is wasteful of both resources and computation. Further, cross-validation does not address the issue of model structure and over-parameterization. Recent work on machine learning has moved toward automatic methods for controlling generalization and parameterization. A model that has gained much popularity recently is the support vector machine (SVM). SVMs use the principle of structural risk

Abstract. This tutorial presents an overview of automatic speech recognition systems. First, a mathematical formulation and related aspects are described. Then, some background on speech production/perception is presented. An historical review of the efforts in developing automatic recognition systems is presented. The main algorithms of each component of a speech recognizer and current techniques for improving speech recognition performance are explained. The current development of speech recognizers for Portuguese and English languages is discussed. Some campaigns to evaluate and assess speech recognition systems are described. Finally, this tutorial concludes by discussing some research trends in automatic speech recognition.