Carbon Five + Cooper: Exploring Alexa & the Future of Voice UIs

Recently, designers and technologists from Cooper & Carbon Five sat down to brainstorm about the future of voice-driven user experiences, focusing initially on Alexa. It was a fun kickoff for what we hope turns into a series of prototypes and experiments exploring (and pushing) the boundaries of this exciting emerging technology. Here’s what we’ve discovered so far:

The Frontier Has Some Hard, Rough Edges

One thing that often comes up when framing a brainstorm is how ‘blue sky’ the team should be when ideating. This is especially true with new technologies, when what can and can’t be done is often well-defined, but likely to change in the very near future.

How you frame a brainstorm in this context is probably worth a post on its own, but for our workshop, technical designers from Cooper & Carbon Five each presented their take on the Alexa platform, and its inherent opportunities and challenges. We also demoed some of our favorite (or at least popular) Skills and a couple of early proofs of concept.

Clearly Creepy, Clearly the Future

Always-on/always-listening devices like the Alexa and Google Home pose huge privacy and security issues. As a consumer, users of these products place their trust in both the parent corporation and ultimately in the State. That said, even if the average developer wouldn’t own one, if the behavior of very young children getting a daily knock-knock joke is any indicator, these kinds of devices will be more and more a part of our lives over the next decade.

What’s Missing, What’s Coming

Despite the apparent inevitability of the platform, it’s still early days. And while the Alexa platform is the more mature option and will continue to evolve, here are a few technical challenges to note if you’re interested in making a Skill in March of 2017:

Triggering code via push notification or timer

One of the first things you want to try and do with an Alexa Skill is to try and get the device to interact with you proactively. Imagine Alexa being able to tell you when someone is at the door or reminding you of your next meeting.

One of the big gotchas of the platform is that you are limited to interactions that are initiated by the user invoking the Skill and Intents by voice. This is probably at least in part due to application sandboxing and we can assume that, like the evolution of the iPhone SDK, both background processes and push notifications that can activate a Skill will come at some point.

Still, this limitation, combined with the way Alexa aggressively tries to parse phrases — even a short pause is interpreted as the end of a command — means that the experiences are by necessity relatively short-lived.

Parsing free text

The second thing we tried to do with Alexa was to use it to take something you said and pass it along to a third-party service. Specifically, we wanted to be able to ping another employee by name via Slack. And while the Alexa platform is great at recognizing things in the Amazon universe (i.e. book titles and actors) it is pretty poor at processing long blocks of spoken text, for instance. Other stumpers include unusual first names or even first names that sound similar (i.e. “Dan” and “Don”).

And, while there is almost-deprecated LITERAL Slot type, any Skill that needs to parse free text will require a fairly robust search service on the backend and a well-executed conversation tree on the front end to catch the inevitable edge cases. Even more frustrating, the built-in Simon Says Skill has access to internal private libraries that clearly parse free text better than the publicly available API. Bottom line: don’t assume that you can rely on free text without trying it out in code.

Knowing the location of the device

Given how far Alexa pushes the limits of privacy, it’s a little surprising that Skills don’t immediately have access to the user’s location. Instead, you will need to capture and store a postal code or address. This would normally be done as part of the setup flow for your Skill but it seems like something that could be easily streamlined and shared across Skills. Hopefully in the near future (note from the future: this feature now exists!) users will be able to automatically provide their Lat/Long, Zip code, or primary Amazon shipping address to Skill upon request.

Use by multiple users or strangers

From an interaction-design perspective, Alexa is optimized for in-home use by someone who is logged into their own Amazon account, and knows how to use an Alexa device, which Skills are installed on the device, and how to use those Skills. This means that operating an Echo or Dot at a stranger’s house or in an office setting comes with additional challenges (like, how do you even Alexa?). One of our first ideas was a “virtual receptionist” and this problem, along with the unreliable parsing of first names made that concept seem particularly challenging to pull off. That said, differentiating between users that have created a custom profile for a particular Alexa device is on the horizon, which ought to help with some use cases.

Differentiating your brand

One of the Cooper designers brought up a particularly interesting challenge. When we think of the expression of a brand, one way we talk about it is in terms of ‘tone’ and ‘voice.’ Since Alexa has a specific voice out-of-the-box, the question becomes how you communicate and differentiate your brand through a voice UI?

Alexa provides a few ways to tweak this. First, you can provide a phonetic guide (using markup called SSML) for pronouncing key words or phrases (useful for pronouncing a brand or product name correctly). Additionally, you can provide short audio files in your response, which seem to be (at least in part) intended for sonic branding.

Stateful conversations

When we’re designing and building bots of any decent complexity, we almost immediately need to preserve state. Alexa gives you a session to work with, but designing flows and implementing a state machine are still decent challenges. Preserving this state across multiple sessions means getting into OAuth and a more interesting application on the backend.

Given the lack of both a visual UI and a reliable input device (like a keyboard), combined with the relatively short snippets of voice that you need to use to control the Skill, complex interactions that require much thinking (say, scheduling an event with multiple participants) aren’t recommended. In fact, the overall Alexa experience can be in turns both frustrating and delightful.

But there’s lots of easy stuff, too

While there are still a lot of rough edges, the current state of the Alexa ecosystem is still a lot of fun to work with. For instance, developing locally is a breeze. Unlike getting a ‘hello world’ application running locally on your personal iPhone, your first Alexa Skill will probably take you less than a half hour, and that’s including setting up an Alexa-ready Lambda service and prepping the app for submission to the Skill store. And unlike a lot of AWS documentation, the Alexa docs are pretty thorough and feel like they were mostly written in one go.

So, if you already have a Facebook Messenger, Slack, or other ‘bot, you might as well consider if an Alexa Skill would be useful to your customers. That said, the user’s context for a voice UI is quite different, so the best applications will take this into account. Also, we feel like some of the most interesting experiences will happen when people can move those interactions and conversations easily between devices, input methods, and contexts, throughout their day.

Next Steps

We’re excited about this new class of devices, and Alexa feels like a good place to start. And we’re especially excited to be collaborating with our friends from Cooper. While we tend to stay grounded in how we might use technology today to solve problems, they bring energy and excitement about the future, which makes us a great team. So stay tuned as we combine forces throughout this year and continue to explore voice UIs, and hopefully create some interesting experiences to share along the way.

If you’d like to talk further about using technologies like these on your next project, please drop us a line. We’d love to chat.

Carbon Five is a full service software consultancy that helps startups and established organizations design, build, and ship awesome products. If you have a project you’d like us to take a look at, or are interested in joining our team, please let us know.