(Cat? OR feline) AND NOT dog?
Cat? W/5 behavior
(Cat? OR feline) AND traits
Cat AND charact*

This guide provides a more detailed description of the syntax that is supported along with examples.

This search box also supports the look-up of an IP.com Digital Signature (also referred to as Fingerprint); enter the 72-, 48-, or 32-character code to retrieve details of the associated file or submission.

Concept Search - What can I type?

For a concept search, you can enter phrases, sentences, or full paragraphs in English. For example, copy and paste the abstract of a patent application or paragraphs from an article.

Concept search eliminates the need for complex Boolean syntax to inform retrieval. Our Semantic Gist engine uses advanced cognitive semantic analysis to extract the meaning of data. This reduces the chances of missing valuable information, that may result from traditional keyword searching.

Publishing Venue

Related People

Abstract

The success of an automatic speech recognition system is critically dependent on the quality of its signal processing front-end. This is especially true in continuous speech where co-articulation effects may be severe. This work describes a new signal conditioning step which we have found useful for the DARPA Resource Management recognition task.

Country

United States

Language

English (United States)

This text was extracted from an ASCII text file.

This is the abbreviated version, containing approximately
52% of the total text.

New Signal Conditioning Step for Continuous Speech
Recognition

The success
of an automatic speech recognition system is
critically dependent on the quality of its signal processing
front-end. This is especially true in
continuous speech where
co-articulation effects may be severe.
This work describes a new
signal conditioning step which we have found useful for the DARPA
Resource Management recognition task.

The so-called
"front-end" of an automatic recognition system
usually refers to the mapping between the acoustic signal received
through the microphone and a multi-dimensional vector space suitably
encompassing the salient features of this signal. Since these
features are subsequently used for recognition, the quality of the
signal processing is critical to the ultimate performance of the
recognizer. In the IBM speech
recognition system, the basic signal
processing comprises A/D conversion, short term power spectrum
computation, critical band filtering, compressive loudness scaling,
and ear model adaptation [1]. This
particular approach has been
shown to favorably mimic the human auditory system [1].

On the other
hand, such a front-end turns out to be quite
sensitive to the dynamic range exhibited by the speech utterances.
This is because the long term component of the adaptation maps the
effective dynamic range of each band into a fixed dynamic range,
approximately (30,80) dB. This causes no
problem for the usual IBM
office correspondence dictation task, where the typical office
environment exhibits a dynamic range varying between 25 and 45 dB.
However, this has the potential to produce deleterious effects in a
quieter environment, where the floor might be very low. In
particular, this is the case of the DARPA Resource Management (RM)
data [2]. This task exemplifies an
exceptionally noise free
environment, since the utterances were digitally recorded in a
sound-isolated recording booth using a headset noise-cancelling
microphone [2].

Because of
the unusual cleanliness of the DARPA recordings, out
standard signal processing may potentially reduce the effective
dynamic range of speech-related events in a (somewhat unnecessary)
effort to characterize instances of perfect silence. This reduction
may in turn results in a loss of acoustic information and thereby a
drop in the recognition accuracy of our system.
To cope with this
situation, we introduce a signal conditioning step between critical
band filtering and long term adaptation.
Recall from, e.g., [1],
that after critical band filtering a frame of speech is represented
by a N-dimensional vector X containing the power spectrum amplitudes
in each of N critical bands, where typically 17 lt N lt 20. This
vector is then converted to a log domain representation before
adaptation processing.

The above procedure is modified by applying to X a
conditioning
transformatio...