Tag: Voice Assistants

Editor’s Note: In my initial post, I mentioned that along with the long-form assessments I’ve been publishing, I’d also be doing short, topical updates. This is the first of those updates.

In the first week of 2018, we saw a handful of significant updates that pertain to various trends converging around ears. Here’s a rundown of what you need to know:

Amazon introduces the Amazon Mobile Accessory Kit (AMAK)

As Voicebot.ai reported from an Amazon blog post, Amazon’s new Mobile Accessory Kit will allow for much easier (and cheaper) Alexa integration into OEM manufacturer’s devices, such as hearables. It’s been possible in the past to integrate Alexa into third party devices, but this kit will serve as a much more simplified process to convert any type of hardware into Alexa-integrated hardware. This is great news for this new use case, as it will surely put Alexa in more and more of our ear-worn devices.

Per Amazon’s senior product manager, Gagan Luthara:

“With the Alexa Mobile Accessory Kit, OEM development teams no longer need to perform the bulk of the coding for their Alexa integration. Bluetooth audio-capable devices built with this new kit can connect directly to the Alexa Voice Service (AVS) via the Amazon Alexa App (for Android and iOS) on the customer’s mobile device.”

Starkey Announces Exciting Additions to Next Generation Hearing Aids

There were a number of exciting revelations at Starkey’s Biennial Expo, but among all the announcements, there were two that really intrigued me. The first was the inclusion of “fall detection” sensors in Starkey’s next generation of hearing aids. This will be the first hearing aid with inertial sensors:

On the surface, this is really great, as every 11 seconds an older adult is treated in the emergency room for a serious fall. The purpose of these sensor is to detect those type of falls, so that the user can get immediate help. What’s even more intriguing is the fact that we’re now beginning to see advanced sensors being built into this new wave of hearing aids. As I will write about soon, the preventative health benefits combined with smart assistants, offer some very exciting possibilities and another promising use case for our ear-worn devices.

The second announcement, was the upcoming live-language translation feature to be added to this same, next generation of Starkey hearing aids. This stems from Starkey’s partnership with hearable manufacturer, Bragi, which has this feature available with its Bragi Dash Pro. The live-language translation is not Bragi’s proprietary software, as Bragi currently uses the third party application, iTranslate to power this feature for its device. Although it has not been announced formerly, I expect that Starkey’s live-language translation feature will also be powered by iTranslate. Expect more features like this to become more widespread across our connected devices over time as more manufacturers support this type of integration.

As we move into week two of 2018, expect another wave of exciting announcements coming out of CES. Check back here next week as I will be doing a rundown of the most important takeaways coming out of Vegas this week.

The Road Starts Here

If you examine the past 50 years of user interfaces in computing, what you’ll see is that a new one surfaces every 10 years or so. Each of these new interfaces has been an incremental step away from hardware-based interfaces, to ones that are more software-based. From the 1970’s – early 1980’s, in order to “communicate” with a computer and issue your intended command, you’d need to use Punch Cards and Command Lines.

PCs were introduced in the 1980s and as computers began to migrate from the military, government and academia, into our homes, so too did the Graphical user interface start to permeate as it was far more user-friendly for casual computer users than Command Lines. This was the preferred user interface until the mid-90’s when the Internet began to really take off.

As the Internet opened the door to an endless amount of new uses and functions for computers, the Hypertext interface (HTML) bloomed as we needed an interface that was more conducive to web-based functionality, such as hyperlinking and connecting parts of the web together.

Then in 2007, Steve Jobs famously ushered in the mobile computing era with the unveiling of the iPhone. Along with the introduction to our pocket-sized supercomputers, we were also presented with the Multi-Touch interface which has gone on to become the most widely preferred interface globally.

So, 10 years after the iPhone debuted and based on the history of new user interfaces surfacing every 10 years or so, it begs the question, “what’s next?” Since this is FuturEar after all, you better believe it will largely center around our ears, voices and how we naturally communicate.

Reducing Friction

There are two underlying factors to consider when looking at why we gravitate toward each evolution in user interfaces. The first is the tendency for users to prefer as little frictionas possible. Friction essentially represents the clerical, tedious work that you’re required to do in order to fully execute your command. Let’s use maps as an example and the idea of trying to get from point A to B in an unknown area.

In the past, prior to the PC and internet, you were limited to good, old-fashioned maps or asking for directions. Then, technology enabled you to use the likes of MapQuest which allowed you to print off turn-by-turn directions. Today, in the mobile era, you can simply pull up your favorite map app, punch in your destination, and let your phone guide you. Each progression reduced friction for the user, requiring less time and energy to do what you were trying to do: get from point A to point B.

The second factor to look at is the type of computers being used in conjunction with the user interfaces. When we shrank our computers down to the size of a phone, it wasn’t feasible to use a mouse and keyboard, so we shifted to just using our fingers on the screen. Nor was HTML necessary prior to the internet. The interface adapts as the computers we’re using evolve.

Which brings us to our über-connected world where we’re bringing everything we possibly can online. Gartner estimates that in this age of the Internet of Things (IoT), we’ve brought 8.4 billion devices online and that figure will climb to 20.4 billion devices by 2020. So, how then do we control all of these connected-devices, while continuing to reduce friction?

Abra Kadabra

The answer lies in what tech pioneer Brian Roemmele has coined the “Voice First” interface. He hypothesizes that as we move into this next decade, we’ll increasingly shift from issuing commands with our fingers, to issuing them with our voice. Which is great, because speech and language are humans’ most natural form of communicating, meaning there’s no learning curve in adopting this habit. This is an interface that is truly for all ages and levels of sophistication. It’s built to be as simple as conversing with the people around us.

So, what are we actually conversing with? That would be our smart assistants, which are primarily housed in our smart speakers and phones currently. Amazon took an early lead in the smart speaker market, but it didn’t take long for Google to introduce its own line of “OK Google” speakers, resulting in 20 million Alexa speakers and 7 million Google speakers sales thus far. This number will grow significantly before year’s end, as it’s estimated that 20% of US households will be purchasing a smart speaker for the holidays.

You might be asking, “but wait, we’ve had Siri in our iPhones since 2011, how is this different?” You’re right, but it wasn’t until recent machine learning breakthroughs that have drastically improved speech recognition accuracy in understanding us. Hence the recent popularity of these smart speakers and our voice assistants. There are far less, “I’m sorry, I didn’t understand that” responses and they serve an increasingly important role in facilitating our commands to control the billions of connected IoT devices we keep bringing online.

So, let’s look at the two criteria that we need to check off in order for this interface to be mass-adopted. We need to ensure the interface is conducive to the computers we’re using and do so in a way that reduces friction beyond how we’re interacting with them today. Voice provides us the ability to quickly control all of our IoT devices with simple voice commands, trumping the finger tapping and app toggling that multi-touch offers. When it works properly, speaking to our assistants should feel like talking to a genie, “Abra Kadabra, your wish is my command.”

Groceries – “Alexa order me all the ingredients for Dave’s Famous Souffle recipe”

Heading Home

I believe that over the course of the next decade the Voice interface will continue to become more powerful and pervasive in all of our lives. Although we’re in the infancy of this new interface, we’ve quickly begun adopting it. Google confirmed 20% of its mobile searches are already conducted via voice, Pew Research found that 46% of Americans currently use a voice assistant, and Gartner projects that 75% of US households will own at least one smart speaker by 2020.

We’re also seeing smart speakers and voice assistants begin wading into new waters, such as the workplace, cars, and hotel rooms. This will likely open up brand new uses cases, continue to increase the public’s exposure to smart assistants, and expand our understanding of how to better utilize this new technology. We’re already seeing an explosion of skills and applications, and as each assistant’s user network grows, so too do the network effects for each assistant’s platform (and the interface as a whole) as developers become increasingly incentivized to build out the functionality.

Just as we unloaded our various tasks from PCs to mobile phones and apps, so too will we unload more and more of what we currently depend on our phones for, to our smart assistants. This shift from typing to talking implies that as we increase our dependency on our smart assistants, so too will we increase our demand for an always-available assistant(s).

What better place to house an always-available assistant than our connected audio devices? This isn’t some new, novel idea, as 66% of all hearables already include smart assistant integration (this figure is almost entirely driven by Apple’s AirPods). In addition to AirPods, we saw Bose team up with Google to embed Ok Google in Bose’s next line of headphones, and Bragi integrate Alexa in the Dash Pro’s most recent update. Rather than placing smart speakers throughout every area we exist, why not just consolidate all of that (or a portion) to an ear-worn device that grants you access whenever you want?

I originally surmised that our connected audio devices will give way to a multitude of new uses that extend way beyond streaming audio. Smart assistants provide one of the first, very visible use cases beginning to emerge. I believe that smart assistant integration will become standard in any connected audio device in the near future – be it ear-buds, over-the-ear headphones or hearing aids. This will provide a level of control over our environments that we have not yet seen before, as we simply need to whisper our commands for them to be executed.

Our own little personal genie in the bottle ear… what better way to reduce friction than that?