The Road to Here

In our fast-paced society we consistently push the limits of technology and human computer interaction. The pace only continues to quicken in the mad rush of innovation. Today it is likely safe to assume that your company’s employees and customers expect the same.

First came the days when you needed a website to be current. It wasn’t long before static websites moved to dynamic content and web apps started to mature. Then we gradually transitioned to the Golden Age of the App. If you had an app your IT staff could check that block.

With the app ecosystem starting to become saturated you need more innovation and personalization to differentiate yourself and to give the ease of use the is demanded out of top-notch apps. Enter Speech Recognition.

Speech recognition’s future goes back quite a way too.

Hollywood has used Speech Recognition to thrill and excited us with memorable scenes including

IronMan – Jarvis

2001: A Space Odyssey – HAL 2000

Star Trek: The Next Generation – The ship’s computer

Speech Recognition has actually been around for quite some time, but it was quite limited in scope. The proliferation of mobile phones and the maturation of Speech Recognition software and neural networks has made this a completely different ball game now. There is speculation that 2017 is the year of Voice Recognition. The error rate has dropped from 43% in 1995 to only 6.3% this year and is now on par with humans.

Voice Search: Usage Increasing Quickly

Ways to Interact With Voice

There are a handful of different ways that you can utilize voice interactions to build your user experience. Which methods you choose are largely dependant on your existing assets and infrastructure, and what you want to accomplish.

Voice Assistants: Siri, Google Now, Cortana

Siri / Google Now Integration

Users are familiar with this method of interaction

Some limitations exist

Alexa / Google Home

Rapid increase in sales of voice recognition hardware

Requires voice-only interactions

Custom In-App Voice

Engage your app users while they are using the app

Must handle Natural Language Processing yourself

Web-Based Voice Recognition

Enable voice commands for repetitive tasks

Voice Assistants: Siri, Google Now, Cortana

The Voice Assistants of yesteryear have grown up and have added a late addition to the party. They provide some cool and genuinely useful tools and integrations – but their use doesn’t stop there. Siri and Google’s assistants have opened up their platforms a bit, and Cortana is getting ready to. There are a lot of good options to integrate with these assistants

Siri

SiriKit enables your iOS 10 apps to work with Siri, so users can get things done with your content and services using just their voice. Currently they only offer interactions with the following “intents” or capabilities:

OK Google / Google Now / Google Assistant

System Actions include the following intents that you can integrate with:

Alarm

Communication

Fitness

Local

Media

Open

Productivity

Search

There are a lot of things that Google Voice Actions already recognize. This website is a great way to discover what’s possible.

You can define Custom Actions to support additional use cases.

Currently, custom actions are only available on GoogleHome and Pixel. Other devices will follow soon.

Cortana

From basic mobile deep links to full integration of your bots and services, the skills kit provides all the tools and docs you need to promote your services and engage users through the Cortana experience.
Once created, your skill works wherever your code runs. By registering your bots, services, mobile apps, and websites as Cortana skills, over 145 million active monthly users will be connected to these capabilities.
People can interact with your skills in various ways. Cortana can offer a skill based on a natural language request during a conversation, or proactively present a skill based on a user’s preferences and context.

Alexa / Google Home

The New Kids on the Block

Google Home and Amazon Echo (Alexa) are one more outlet to digitally interact with your customers. Furthermore, it is an extension to your digital brand outside of the app, still enhancing and simplifying your customer’s lives while connecting with them through digital means.

The Echo and Home are more than just speakers – they are built to help users at home, the location where the shopping experience begins. Both Alexa and Home can integrate with backend services allowing you to extend your brand. Although the market is still young, integrating with these devices can prove to be very beneficial.

Pros

Users are already familiar with voice control

They are invested in the platform

Development platform capabilities are strong

Cons

Voice only interaction, called Voice User Interface (VUI)

Alexa Voice Services (Amazon Echo)

Offer the most robust development tools

Strongly positioned in the market

Shipped 5MM units, expects to double this in 2017

Best external voice controlled device currently

Alexa Voice Services: Under the hood

User Flow

Alexa Skills Kit Architecture

Alexa Skills

Google Home

Google Home is a Wi-Fi speaker that also works as a smarthome control center and an assistant for the whole family. You can use it to playback entertainment throughout your entire house, effortlessly manage every-day tasks, and ask Google what you want to know.

In-App Speech Recognition

Bring Your Own Voice (BYOV)
There are a variety of voice interaction points between the user and the app. Triggering voice interactions from within the app offer a unique method to engage your users

Pros

Enhanced capabilities, less limitations

Continue the voice conversation inside of the app

Cons

Rolling your own solution takes expertise in several areas. If you are want smart features that resemble a voice assistant you will have to figure out how to handle

Web-Based Voice Recognition

Circling back around to where we began – we can’t leave web based voice recognition out of the equation. If you are using Chrome or Firefox you have noticed that this page supports Speech Recognition. This capability comes from the Web Speech API. Of particular note it also handles Speech Synthesis.

This has been possible for several years now but it hasn’t been put to much good use. Web-based voice recognition shares a lot of similarity with in-app voice recognition in that you have to handle everything yourself.

Voice User Interface (VUI)

A corpus of research has shown that people infer personality traits from even the briefest voice interactions. Voice is a form of Human Computer Interaction (HCI) that does exactly what the name infers: Humanizes the interactions. Because of this it is important that you take special consideration of how you communicate with the user.

Although much good advice for Graphical User Interfaces (GUIs) may apply, don’t try to simply convert your GUI into a VUI. There’s a lot more to think about.

Here are some tips for conversations, from Google about Google Assistant: (Video)

Create a persona: The “face” of the company.

Leverage your brand.

List brand core attributes that can be conveyed in voice

Bio-sketch of this user, perhaps give it a name

Serves as a grounding mechanism to fall back on for consistency

Define yourself as separate from the Google Assistant

Greet the user

Think outside the box

Don’t start with code

Write out core experiences like you would a screenplay

Keep it simple

Context matters

Where is the user?

Where are they?

What are they doing?

What type of device are they acting on?

How is the experience influenced over time?

Cater to the user’s intent, not a feature

In Conversation there are no Errors

There are limitations, but recognize them for what they are

Take voice input “errors” and make them into a meaningful conversation

Look at the interaction from the user’s perspective

Think bigger

Starting simple is good but…

Don’t limit yourself here.

Help somebody gain access to information that they didn’t have before

Communication is Key

If you can communicate well, you will engage and even entertain. But it’s not clear sailing from here on out because dealing with voice interactions a lot is going on.