We have a long way to go before voice technology has reached the human level of interactions. Ken Arakelian starts this four-part blog series talking about where we are today in voice, where we’re going and where Nuance is leading the way.

Alexa: “Right now the temperature is 56 degrees today in with cloudy skies. Today you can expect clouds and showers with a high of 60 degrees and a low of 44 degrees. ”

Me: “What about tomorrow?”

Alexa: [blank stare]

Me: “Ugh – Alexa – what will the temperature be tomorrow?”

Voice as a computer interface has come a long way, but it’s still clunky and nothing like talking to another person. Our amazement with how far the technology has come since voice recognition in IVRs came on the scene in the 1980s can make us forget the remaining problems we have to tackle to get to human-level interactions. In this blog series, I’m going to take each remaining hurdle and talk about where we are today, where we’re going and how Nuance is leading the way.

Part 1:Automatically generating dialog for conversations is a complex problem to solve.

“Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.” Believe it or not, this is a grammatically correct sentence and illustrates why automating natural language processing and conversation is hard. If you’re wondering what the Buffalo sentence means you can click the link and read about it (helpful tip – take an Advil). The tl;dr (too long;didn’t read) version is that the word “buffalo” can be a proper noun, noun, or a verb, so the sentence translates to something about how buffalo from Buffalo bully (aka buffalo) buffalo, etc…

This is obviously an extreme example, but it just goes to show that there is plenty of meaning and “nuance” hidden in the words people choose that computers haven’t been “taught” to understand yet.

Here’s an example that may resonate more with English speakers:

SHE never told him that she loved him. (but someone else did)

She NEVER told him that she loved him. (zero times in their entire relationship)

She never TOLD him that she loved him. (she showed it but never said it out loud)

She never told HIM that she loved him. (but told everybody else)

She never told him that SHE loved him. (but that someone else did)

She never told him that she LOVED him. (only that he liked him and thought was funny)

She never told him that she loved HIM. (she said she loved someone else)

As a live, English-speaking human, you would catch the subtle changes in meaning just by placing inflection on different words. However, artificial intelligence would have to be taught that kind of nuance.

Another great illustration of the complexity of language can be seen in a video of physicist Richard Feynman, apparently being condescending to his interviewer: Richard Feynman Magnets – YouTube. The interviewer is simply asking Dr. Feynman to explain magnetism to him, and Dr. Feynman refuses and dismisses the question, saying that the interviewer won’t understand. The net of the video is that Dr. Feynman can’t explain magnetism in a meaningful way without a shared frame of reference – and he and the interviewer don’t share one. The interviewer doesn’t have the degrees that Dr. Feynman has, so he equates it to explaining to an alien why his wife is in the hospital with a broken leg. Well, she slipped and fell. Why did she slip and fall? Well, she was walking on ice. Why is ice slippery? …etc., on down into deeper and deeper levels of complexity – for seven minutes – and never answers the magnetism question. (One viewer posted, “This is why no one talks to you at parties.”)

This complexity is at the core of the problem we need to solve for computers to “learn” how to converse with humans. Nuance is making great advances in automating conversation. Currently the state of the art in this area is still Simple Question Answering (essentially Enterprise Search front-ended with Natural Language Understanding). See Paul Tepper’s Post on advances in automating conversation. Nuance is working internally and with research partners on encoding the general knowledge that computers need in order to decipher the buffalo sentence and to have a frame of reference to converse with humans.

So, just in case you didn’t have a frame of reference when reading this blog post, go back and read the Wikipedia entry on the buffalo sentence and watch the Dr. Feynman video. Then you’ll understand the monstrous task we have in bringing voice technology up to human-level interactions.

Let’s work together

About Ken Arakelian

Ken Arakelian is the director of product management for Nuance's On Demand business. With more than 15 years of experience, Ken has worked in the contact center industry as a consultant, account manager and product manager.