How Building for Voice Differs from Building for the Screen: Make Your Voice-First Interactions Accessible

In our last installment of our series on How Building for Voice Differs from Building for the Screen, we covered personalization and how you can individualize your voice-first interaction to create a personal experience for your customers. Here we cover another key voice design principle, which is to make your interactions accessible for voice.

When designing customer experiences for the screen, it’s important to determine what's surfaced at the top level and what gets organized into menus or a similar information architecture. This is because there are a limited number of pixels and visual concepts that we surface at once.

A graphical user interface (GUI) information architecture effectively sets the customer’s path. The menus set the hierarchy, surfacing the most important buckets as top-level navigation and nesting the less important items within those menus. On the screen, pixels are the scarcity, and this constraint makes hierarchical menus mandatory; not every item can live at the top level. More importantly, presenting too many options at once adds a cognitive load on the user.

With voice user interfaces (VUIs), the opposite is true. Hierarchical menus don’t help the user; they constrain what they can users can do in the moment and they add cognitive load to figure out how the experience is organized. In order to enable conversational, freeform interactions, voice-first UIs require a different arrangement of their features, with all available options presented at the top level.

Be Accessible: Collapse Your Menus; Make All Options Top-Level

Imagine a banking app that includes the ability for users to look up the routing number. This is an important piece of information but something users will only rarely need. In an app, the best way to present this information would be via a menu system. In voice, the opposite is true. You'd never want the user to have to say, “Ask Bank of Redmond for: menu, menu, menu, routing number.” Instead, you’d want to enable the user to simply say, “Ask Bank of Redmond for my routing number.” In other words, the routing number intent is now presented at the top level. This doesn't add cognitive load because users aren’t seeing all the options presented at once. In fact, it allows for more serendipitous discovery because they can simply try things that are reasonable to expect from an experience like this.

While menus add depth to GUIs, they introduce friction to voice-first UIs. Voice interactions should instead offer their experience at the top level—without the need to learn its information architecture.