This whitepaper demonstrates how to use the Microsoft .NET Speech SDK to build a complete e-commerce starter application. Use these detailed techniques to build your own commerce system that will have your customers browsing, shopping, and making purchases using nothing but the sounds of their voices.

by Paul Osburn

Matt Hempey

Chris Idzerda

Apr 23, 2003

Page 13 of 13

Lessons Learned
We learned a great deal about building voice-only applications through the process of building these samples. Here we note some of the major points in the areas of user testing, design, and development.

Testing: The testing and tuning phase is important in any application, but in terms of design, it is especially important in voice applications. We found that tuning our prompts, accept thresholds, and timeouts were key to making the application useful. Here are a few suggestions on how to conduct effective testing and tuning for voice-only systems.

Properly Configure Testing Equipment First: Many of our early user tests generated numerous usability problems that were due to improper configuration of the microphone. The microphone was too sensitive, picking up background noise, feedback from the speaker output, and slight utterances as user input. Users became increasingly frustrated as they found it difficult to hear a prompt in its entirety. This affected test results significantly.

Select Testers Carefully:We found that testing subjects brought a variety of expectations to the testing process. Developers whom we used as subjects often made assumptions about the way the system was working and became confused with ambiguous prompts like, "Would you like to start shopping or review your previous orders?" They preferred more explicit choices: "Say start shopping to start shopping or review orders to review your account history." Testers with a less technical background preferred less structured prompting; they felt they were speaking with a more friendly system.

To conduct effective tests, make sure the user group you are testing matches the target user group for your application.

Design
The most important lesson designing the application was the importance of tuning the prompt design throughout development. From the first stages of implementation through user testing of the completed system, we made changes to prompts to achieve a more fluid program flow. Our experience speaking with other teams who have attempted similar projects is that this is a fundamental part of voice-only application development.

With that in mind, here are a few points that will make the tuning process much more efficient:

Long Prompts Don't Equal Helpful Prompts: At the outset, our design team approached the goal of a friendly interface by writing friendly text. Testing quickly revealed that verbose prompts were a serious impediment to usability. By keeping prompts short, users understood better what to do.

Express Sentiment with Tone/Inflection: We found that helpfulness is best expressed through intonation and inflection, rather than extra words. A prompt like, "I'm sorry. I still didn't understand you. My fault again," expresses an apologetic sentiment on paper quite well, but spoken, it becomes excessive. This prompt became, "I'm sorry. I still didn't understand you," and we let the inflection of the speaker express the emotion. A good rule of thumb: speak prompts first before writing them down.

Build Cases For Invalid (but likely) Responses: Our tests surprised us when a majority of users answered, "Yes," to the question, "Would you like to start shopping or review your previous orders?" We realized that part of the problem was the way in which the question was asked, but still, we built in a command to accept that response and provide a helpful response.

Keep the Number of Options Small: We found that listing more than three or four choices in a prompt dramatically reduced usability. Users would get confused and would not remember their choices. We made every effort to reduce the number of options offered to a user in any given prompt.

Maintain a Prompt Style Guide: Design teams are used to maintaining style guides for their designs, and voice-only applications should be no exception. Having a consistent set of prompt styles and standard phrasings is paramount to creating a sense of familiarity for the user. Our team recommends an iterative process: modify the guide liberally in the early stages of a project as new cases arise. Then, toward the later stages, tweak new cases to fit the existing rules. This process should lead to a consistent user experience throughout your system.

Development
We needed to make several changes to our development strategy worth noting here.

Necessary Modifications to the Business and Data Layers: The concept of building a voice-only presentation layer as a replacement for a GUI necessitates a few changes to the database and business logic layers we didn't foresee. These changes both relate to the types of data required by the particular constraints of the voice medium:

Pluralized Names: In a GUI context, quantities of items are usually expressed in some sort of table format that very closely resembles a table in a database:

Product
Name

Quantity

Counterfeit
Creation Wallet

2

Contact
Lenses

4

In a voice-only context, while it is possible to read this information as, "Product Name: Counterfeit Creation Wallet, Quantity: 2, Product Name: Contact Lenses, Quantity: 4" it is preferable to read it as, "Two Counterfeit Creation Wallets, and Four Contact Lenses." We added a productNamePlural field to our Products table to enable this change.

Different Login Information: The Web version of the store accepts an email address and password as its login information. Both of these pieces of information are not easily expressed in a voice context. We replaced these fields with Account Number and PIN fields, which also necessitated database changes.

It becomes evident that these changes are changes to user-interface elements stored in the data layer. In essence, a product name is really a GUI identifier for the product row. In the voice-only context, these identifiers may not always apply and so may require changes to the database layer.

Paul Osburn is an Engineering Practice Manager at Vertigo Software, Inc., a San Francisco Bay Area software construction company. For the past 12 years Paul has created enterprise applications for a diverse set of industries, including the robotic, healthcare, and marketing research
industries. Recently Paul led the development of the ASP.NET Starter Kits and the .NET Speech SDK Sample applications. You can contact him
.