Using Speech APIs in Windows Phone 8

When the app uses different PhraseList elements for the voice commands, the speech recognition is pretty accurate. You can also use lists to constrain the text against which the speech recognizer must match. This significantly improves accuracy. For example, each recipe requires a specific skill level and you don't want the user to have to select the skill level from a dropdown list. You can provide the list to the speech recognizer in a similar way to how you defined the elements for the PhraseList.

The following lines constrain the possible results of speech recognition to the values included in the skillLevelsList<string> by calling the skillLevelRecognizer.Recognizer.Grammars.AddGrammarFromList method:

The code creates a new instance of Windows.Phone.Speech.Recognition.SpeechRecognizerUI, and initializes the settings to display "Which is the main element?" and provide sample text by joining the strings in the list (Figure 9). This way, the user knows what he can say.

Figure 9: A speech recognition session to ask the user which is the skill level required for the recipe.

If cloud-based speech recognition can hear what you said, it displays the results of the recognition and the phone's voice will tell you what you said (Figure 10). You will notice that the recognition has really improved its accuracy with the use of the list.

You can also add your own grammar definitions to the speech recognizer by using an XML file that conforms to the Speech Recognition Grammar Specification (SRGS) W3C standard. With SRGS, you can improve accuracy for speech recognition required in complex scenarios. If you want to dive deeper on SRGS, you should check out the SRGS 1.0 specification.

Providing a Voice Response with Text-to-Speech

If you want to have an app that provides a voice-driven UX, you must use Text-to-Speech, also known as TTS, in order to turn text into spoken words. If the user is speaking to the phone, he won't want to read the output on the screen. Instead, he will expect the phone to provide voice feedback for each interaction.

The basic use of TTS is pretty simple. Add the following using statement to your code:

using Windows.Phone.Speech.Synthesis;

Now, you need only create a new instance of Windows.Phone.Speech.Synthesis.SpeechSynthesizer and call its SpeakTextAsync method with an asynchronous execution (and with the text that the phone's voice must read back to the user). The following lines show an example of TTS informing the user the recipe with a specific main element has been added to his wish list:

The SpeakTextAsync method is useful when you want the phone's voice to read one sentence. However, if you want the phone to read all the necessary steps for a recipe, you probably want to introduce breaks between each step. The speech synthesizer supports the W3C Speech Synthesis Markup Language (SSML) standard with minor differences You can use SSML to provide hints to the synthesizer on how to read the text.

The following lines show a simple example of three recipe steps that the code uses to generate an SSML string, which the synthesizer will read:

This way, the speech synthesizer will break one second after reading each recipe step. Once the SSML XML is built, the code creates a new instance of Windows.Phone.Speech.Synthesis.SpeechSynthesizer, and calls its SpeakSsmlAsync with an asynchronous execution and with the SSML XML. SSML allows you to further customize the speech output. If you want to dive deeper into SSML, consult the SSML 1.0 specification.

By using voice commands, speech recognition, and TTS capabilities, you can provide a complete speech-driven UX in Windows Phone 8 apps. Because many Windows Phone 8 apps take advantage of the speech features by default, users are expecting more apps that provide similar experiences. Give 'em what they want, and they'll be happy customers!

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!