> I question whether this is a more basic problem. Isn't it possible
> and even likely that getting the words right would be made possible
> more frequently with the ability to detect sentence boundaries and
> emotional content more accurately?
> Liz Coppock
Yes and no.
There is certainly evidence that prosodic features can help. But if
you are working with a system on a task where the word error rate is
70%, prosodic features are not going to reduce it to 30%.
-
Richard Sproat
http://www.research.att.com/~rws/

Re: Linguist 13.2046, Disc: Accuracy in Speech Recognition: Priorities
>
> Date: Wed, 7 Aug 2002 23:58:39 -0500
> From: Elizabeth Coppock <e-coppocknorthwestern.edu>
> Subject: Re: 13.2044, Disc: New: Accuracy in Speech Recognition: Priorities
>
> Richard Sproat wrote:> How about the more basic problem of getting
most of the words right?
>
> I question whether this is a more basic problem. Isn't it possible
> and even likely that getting the words right would be made possible
> more frequently with the ability to detect sentence boundaries and
> emotional content more accurately?
Most people in the speech community are not working on multi-sentential
utterances or emotion detection, though some are. For the most part, we
are working on single utterances, which correspond (more or less) to
sentences, phrases, or even single words, depending on the state of your
application.
I agree with Richard about mainly getting the words right, though I
would modify that slightly. People like to talk about Word Error
Rates, but I take those numbers with a grain of salt. While it's
obviously good to get all the words right, you usually don't need to
do that. You do need to get all of the semantic
assignments/slots/frames/keys right. If my recognizer hears my "What
is the temperature of the port engine?" as "What an temperature for
port engine?" doesn't really matter so much as long as my parser can
figure it out (which it can).
And for most speech apps, for a given state you can write a grammar
that looks for a word, phrase, sentence or even multiple sentences.
If you "get the words right" (or nearly right) and you've written your
grammar properly, sentence boundaries won't matter so much.
Now the intel community may have a different perspective, of
course.... :-)
-
Kurt Godden, Ph.D.
Principal Member of the Engineering Staff
Advanced Technology Labs
Lockheed Martin Corporation

August 8, 2002
Re Linguist 13.2044
> The NYT article that Karen S. Chung pointed us to is a pretty good
> example of the kind of reporting that anyone who works on speech
> technology (or at least anyone who is honest) should cringe at.
For me, the most important thing that could develop from speech
recognition technology would be a method for identifying _hostility_
in spoken English in a fashion that would be objective and
reliable. Standard practice in American English is for people to use
hostile language with an associated hostile intonation -- and then
deny their action by saying, "But all I _said_ was..." followed by the
words they spoke, but with a nonhostile intonation. Because no
mechanism for objective identification of hostility in spoken English
exists at the moment, establishing verbal abuse as a criminal act is
still impossible even when there is a taped record of the speech used
and even when substantial harm has been done; abusive language can
only be introduced in court within the context of some other
"recognized" criminal act.
If someone on the list is doing work on this topic that I'm unaware
of, I'd like very much to know about it.
Suzette Haden Elgin