Asked by:

Can't find a way to speed up Speech Detection with managed API's

Question

We have a telephony-based C# application using the managed Speech API's: Microsoft.Speech.*

Our first issue was that we could not find a suitable API for setting the SpeechRecognitionEngine's input to a real-time audio stream. When SetInputToAudioStream() or SetInputToWaveStream() is called, the engine seems to expect a non-real-time Stream,
because it expects the Stream to be Seekable, requests the total length, seeks to the end of the stream, seeks back to the beginning and reads a WAV header and enough initial audio totaling 4096 bytes ahead of time, etc., before returning from this API call.
We have worked around this issue by finally designing a custom Stream object that "fakes" some of this behavior, and it has been working pretty well for us: it provides SpeechDetected events before it finishes reading the entire stream, which is the
kind of real-time behavior we need.

However, the engine does not fire the SpeechDetected event until it receives about 500-750ms of audio beyond the start-of-speech point. This is true for short utterances (less than 500ms, in which case it doesn't fire SpeechDetected until after the
utterances has ended), as well as for long utterances (750ms or more, in which case it fires SpeechDetected mid-utterance, but still about 500-750ms after start-of-speech). We're not sure if this is a result of the underlying SAPI wrapper implementation
(buffering?) or a result of the engine configuration.

In any case, we'd like to receive SpeechDetected events sooner than 500-750ms after start-of-speech, preferably using the managed API's. Is this configurable? We have tried looking at white papers on SAPI properties, playing with the token's
ini file settings, etc., with no success.