Meet your new assistant

Smart speakers and other voice-enabled AI tools are coming to the workplace

By Jeffrey Davis

“Hey Stacy, can you open our sales forecast and get me last year’s
retention KPIs?” Today Stacy is a flesh‑and‑blood human like you. Soon
enough you could be addressing a smart speaker, a voice‑enabled chat
bot, or some other app that allows you to talk your way through a
process faster than you could using fingers and keyboard.

Our phones already feature voice‑enabled assistants like Siri,
Google Voice and Bixby. Roughly one in six consumers already has a
smart speaker, such as the Amazon Echo or Google Home, in their homes,
according to a recent survey by NPR, and
sales are growing as fast as smartphones did a decade ago.

Yet at work, the voice revolution can still seem far off. One
deterrent is the trend towards open offices: nobody wants to be that
noisy jerk who can’t stop yelling at his virtual assistant. A global
survey on AI adoption found that while 84% of respondents would freely
converse with Alexa or Siri at home, just 27% would do so
at the office. Lastly, most enterprise‑level software involves complex
interactions of objects and words, requiring mice and keyboards.

But just as smartphones and web‑based software made their way into
the enterprise, so too will the conversational UI. Advances in voice
recognition and synthesis have finally intersected with AI, resulting
in fertile conditions for computers that can listen and talk back
while handling more complex functions and tasks.

“Alexa is barely four years old, and habituating users to a voice
interface is still in the early stages,” says Joe Buzzanga, chief
analyst of New Jersey‑based Fivesight Research. “It’s important that
consumers are experiencing voice now so they can more naturally adopt
it in the office, and there are applications where voice will be the
best interface.”

Employees who say they’re comfortable using voice recognition at
home versus at work

Computers find a voice

Today’s voice assistants can already tackle basic administrative
chores, such as transcribing calls or scheduling meetings, and even
some higher‑level tasks, such as monitoring phone calls to identify
high‑potential sales leads. Reaching even this basic level of accuracy
and ability has taken decades of research. In part that’s because
computers have historically struggled to parse human speech— which is
freeform, creative, and full of idiosyncrasies.

Progress in recent years has come from machine learning, which
involves feeding machines enormous amounts of speech data and teaching
them to recognize patterns on their own. In 2017, Google CEO Sundar
Pichai announced that the company’s voice recognition technology had
reached 95% accuracy—a 20% improvement since 2013.

Andrew Ng, the former chief scientist at Chinese tech giant Baidu,
has predicted that voice assistants will become ubiquitous in the
workplace once they reach 99% accuracy. That
last mile will be challenging. Today’s voice assistants often struggle
to identify names from unfamiliar ethnic groups, or even pop song
songs with “foreign” titles.

Currently, you can only string up to two commands for Google
Home (for example, “Play Spotify and set volume to 10”). Google’s AI
still fails at traffic updates and other combined commands. And
computers still don’t speak entirely naturally: you probably won’t
mistake Alexa for your friend or coworker.

“The technology is moving very fast,” says Joshua Montgomery, CEO of
Mycroft, a startup that is creating an
open‑source equivalent to Amazon’s Alexa. That’s because of massive
investments in smart speakers, improved voice functions for phones and
cars, more advanced chatbots, and so on. Mycroft has raised about $3
million in venture capital and another $800,000 in preorders on Kickstarter and Indiegogo to get its
voice assistant off the ground.

At the other end of the market, Amazon and Microsoft have formed an
intriguing alliance aimed at the workplace. Alexa and Cortana
(Microsoft’s digital assistant) already share reciprocal features;
each can be used to interact with the other platform. Both can perform
basic tasks like setting meetings, managing appointments, and sending
emails. And both work with Office 365, Microsoft’s suite of
productivity apps.

Integrations are still fairly basic, but it’s easy to imagine a
future where Cortana could, for instance, tap into the automated
“Insights” functions of Excel so users can take a quick hit of data
analysis without opening a spreadsheet. Other advances will likely
come from overseas. Last year, Chinese web giant Baidu announced
DuerOS, a proprietary conversational platform that includes more than
100 partner brands, including HTC and Nvidia.

“We’re seeing a virtuous cycle where the technology is accelerating
because so many people are working on it,” says Buzzanga. “Five years
from now, will we still have today’s Microsoft and Google applications
with voice bolted on? I don’t know, but it’s not what I’m looking for.
I think it will be something more radical.”

An assistant worth talking to

The most important ability of any voice assistant is what it can do
with the voice commands that it receives. By this measure, digital
assistants are becoming more capable every year.

The AI‑driven scheduling startup X.ai, for instance,
has created an intelligent agent, Andrew or Amy, which focuses solely
on calendar tasks like scheduling a meeting. While that’s still an
admin chore, Andrew/Amy is capable of working with much less
information than past applications. If you tell Amy to book time with
a potential client on Wednesday, she understands the request and how
to perform it.

Because you’re not monitoring the process (opening the calendar or
jumping into the email thread with the client, for instance) she’s
also making more logical jumps than computers have in the past. Even
more than consumers, enterprise users will demand that this process be
error‑free. “There’s a bar even higher than you’d expect,” says Dennis
Mortensen, the founder of X.ai.

Advances in AI have brought this level of quality within reach. For
example, Andrew helped set up the interview for this article. Asking a
machine to spin up that sales forecast is also possible: Montgomery’s
team at Mycroft is building an assistant that could give a voice reply
to queries about the number of backers, total funding and time left in
the crowd‑funding round.

“Many companies don’t realize they need a voice strategy,” Montgomery
says. With AI and speech recognition both moving so fast, he warns
that companies without one will fall behind.

Some companies may avoid voice tech because request data flows to
servers in the cloud, which can violate corporate security policies.
Startups like Mycroft, which allows companies to control their own
data, may help address this concern. Other companies will address
security needs by building their own voice apps.

AI developers whose apps don’t require voice are just as excited
about its possibilities as voice specialists. “There will be a point
where it’s normal, if not expected, that you talk to your computer,”
says Mortensen.

A platform designed for growth

The strongest driver of the voice‑UI trend is perhaps the simplest:
Voice remains one of the most efficient, cost‑effective ways to communicate.

“My company isn’t paying me to do email ping pong, organize my
receipts, plan travels or do any number of things that are just
chores,” says Mortensen. “The future is where we’re all managers. Even
as a junior employee, you’ll need to figure out: what agents do I need
to do my job?”

Directing your computer to organize and submit your receipts is
surely cheaper than doing it on your own—and surely faster if you can
say it, rather than opening and navigating through a program.

Voice may also prove to be a simpler way to learn new job skills.
After all, it’s an instinctive and natural way of communicating. So,
while call centers may be the first places where we’ll see the
conversational UI take root, the foundation is in place for it to
spread quickly through the enterprise.

Jeffrey Davis, a founding editor of Business 2.0 magazine and
former executive editor at CBS Interactive, writes frequently about
technology and business.