Introduction

The purpose of this article is it to give you a small insight of the capabilities of the System.Speech assembly.
In detail, the usage of the SpeechRecognitionEngine class. The MSDN documentation of the class
can be found here.

Background

I read several articles about how to use Text to Speech, but as I wanted to find out how to do it the opposite way,
I realized that there is a lack of easily understandable articles
covering this theme, so I decided to write a very basic one on my own and share my experiences with you.

The Solution

So now let's start. First of all you need to reference the System.Speech assembly in your application located in the GAC.

This is the only reference needed containing the following namespaces and its classes. The System.Speech.Recognition namespace
contains the Windows Desktop Speech technology types for implementing speech recognition.

System.Speech.AudioFormat

System.Speech.Recognition

System.Speech.Recognition.SrgsGrammar

System.Speech.Synthesis

System.Speech.Synthesis.TtsEngine

Before you can use SpeechRecognitionEngine, you have to set up several properties and invoke some methods: in this case
I guess, code sometimes says more than words ...

// the recognition engine
SpeechRecognitionEngine speechRecognitionEngine = null;
// create the engine with a custom method (i will describe that later)
speechRecognitionEngine = createSpeechEngine("de-DE");
// hook to the needed events
speechRecognitionEngine.AudioLevelUpdated +=
new EventHandler<AudioLevelUpdatedEventArgs>(engine_AudioLevelUpdated);
speechRecognitionEngine.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(engine_SpeechRecognized);
// load a custom grammar, also described later
loadGrammarAndCommands();
// use the system's default microphone, you can also dynamically// select audio input from devices, files, or streams.
speechRecognitionEngine.SetInputToDefaultAudioDevice();
// start listening in RecognizeMode.Multiple, that specifies// that recognition does not terminate after completion.
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

In detail now, the function createSpeechEngine(string preferredCulture). The standard constructor and its overloads are the following:

SpeechRecognitionEngine(): Initializes a new instance using the default speech recognizer for the system.

SpeechRecognitionEngine(CultureInfo): Initializes a new instance using the default speech recognizer for a specified locale.

SpeechRecognitionEngine(RecognizerInfo): Initializes a new instance using the information in a RecognizerInfo object to specify the recognizer to use.

SpeechRecognitionEngine(String): Initializes a new instance of the class with a string parameter that specifies the name of the recognizer to use.

The reason why I was creating a custom function for instantiating the class is that
I wanted to add the possibility to choose the language
that the engine is using. If the desired language is not installed, then the default language (Windows Desktop Language) is used.
Preventing an exception while choosing a not installed package. Hint: You can install further language packs to choose a different
CultureInfo that is used by the SpeechRecognitionEnginge but as far as I know, it is only supported on Win7 Ultimate/Enterprise.

The next step is it to set up the used Grammar that is loaded by the SpeechRecognitionEngine.
In our case, we create a custom text file that contains key-value pairs of texts wrapped in the custom class SpeechToText.Word because I wanted to extend the
usability of the program and give you a little showcase on what is possible with SAPI. That is interesting because in doing so,
we are able to associate texts or even commands to a recognized word. Here is the wrapper class SpeechToText.Word.

Here is the method to set up the Choices used by the Grammar. In the
foreach loop, we
create and insert the Word classes and store them for later usage in a lookup List<Word>. Afterwards we insert the parsed words
into the Choices class and finally build the Grammar
by using a GrammarBuilder and load it synchronously
with the SpeechRecognitionEngine. You could also simply add

string

s to the choices class by hand or load a predefined XML-file.
Now our engine is ready to recognize the predefined words.

To start the SpeechRecognitionEngine, we call SpeechRecognitionEngine.StartRecognizeAsync(RecognizeMode.Multiple).
This means that the recognizer continues performing asynchronous recognition operations until the RecognizeAsyncCancel()
or RecognizeAsyncStop() method is called. To retrieve the result of an asynchronous recognition operation, attach an event handler to the recognizer's
SpeechRecognized event.
The recognizer raises this event whenever it successfully completes a synchronous or asynchronous recognition operation.

And here comes the gimmick of this application, when the engine recognizes one of our predefined words, we decide whether to return the associated text, or to
execute a shell command. This is done in the following function:

I Am enhancing en existing web application which makes the visually impaired people, to be used based on the Text to Speech & Speech Recognition
The same code is working fine in windows application, but not in web application.
Please check the below link and do the needful.

Hello Patrick, i talked with you before and your help worked out great for me thanks!

Now im on one of my final aspect of getting this Program to run good, as you can see below i posted a snip it of the code in question. Do you see how it says "switch" ? (keychar);? well at the moment i can only press 1 or 2 and respectfully either of those functions will be executed such as

RecognizeSpeechAndMakeSureTheComputerSpeaksToYou()
or
SpeechRecognitionWithChoices()

and then if i want to go to the other function i will be forced to restart the program in order to select the other choice.

My question is how can i switch without restarting ? can what type of code should i input in the below code to allow me to still use the switch (keychar) to break case ? within that case?

Thanks for the awesome how to! I have a question regarding the built in command functionality of the speechrecognitionengine. While using speech normally, you can say things like "backspace" which would perform a backspace, "select word" which would select the word the cursor point was on, "delete word" which owuld delete the word the cursor point was on, "select sentence" which would select the sentence the cursor point was on, etc. and it is handled without building out each of these in code - how do I add the standard commands handled by the speech engine without reinventing the wheel and coding these out?

The first speech to text example I followed had the speech engine UI pop up alongside the app (not what I wanted). The default commands worked with this setup - so am wondering if that is required to get the standard edit commands in place. I am using a different code base now.