Microsoft Cortana, and why the future of AI is contextual

To a lot of people watching the technology world and digital assistants, Microsoft's Cortana often ranks low on the list of options. A lot has happened since 2013, when Microsoft began putting Cortana into Windows Phones, including the death of that platform. But in 2019, the voice assistant is getting some significant revisions, including contextual-based conversational abilities as announced at the Build developer conference this week in Seattle. To get some answers about why contextually-aware artificial intelligence (AI) is a considerable breakthrough, and what's coming next for Cortana, we sat down with Andrew Shuman, Corporate Vice President of Cortana Engineering, and Daniel Klein, Technical Fellow at Microsoft Semantic Machines and a Professor at UC Berkeley, to find out.

Years of redefining

Cortana is not dead

While our 30-minute conversation ranged from skills to natural language processing and the problem with smart speakers, the big question we had was about the perception that Cortana is a dead platform. Shuman was emphatic. "Cortana is not dead," he said. "Fundamentally it is a foundational horizontal piece … like Microsoft Account, Microsoft Store, Microsoft Search."

That may come as a surprise to many Cortana users. With the demise of mobile, Cortana has been undergoing an identify alteration during the last few years, including its development shifting to the Office team.

One year ago, we wrote an article detailing Microsoft's plans for Cortana. At the time we said, "Microsoft's end goal is to integrate Cortana into Windows 10 seamlessly so that users don't even know they're using it." That point still holds. It was also in May 2018 when Microsoft acquired Semantic Machines, the Berkeley, Cali.-based AI company that powers the very conversational technology demonstrated this week at Build through Cortana.

Microsoft is working on a Cortana experience that looks and feels like a normal text conversation.

The chaos over Cortana, though, seems to be coming to an end. Recent significant revisions to the Cortana iOS and Android apps, and integrating it through more Microsoft endpoints (such as software and services) means that Cortana is on its way to being less app-centric and more people-centric, basing its knowledge graph on what we do, not just a single, siloed experience.

Regarding the fragmented Cortana experience today in Windows 10 – something we criticized in our May 2019 Update review – Shuman said Cortana on Windows 10 will get a similar typing experience found now on iOS and Android, with a more refined and modern look. "That's just an interim step," Shuman said.

This feature is a big deal because the Cortana experience on Windows 10 has slowly regressed during the last few Windows 10 feature updates. It started as a thriving virtual-assistant experience complete with day overviews, upcoming meetings, latest news, and weather forecasts, but is now an empty shell that does nothing but listen to you when you click it. The Cortana experience on Windows 10 today is something most people aren't going to want to use. It's not fun, nor informative, and it only works if you know what you're going to say. Many people don't feel comfortable talking to their PCs, so a redesigned Cortana experience that puts typing front and center is vital.

This decision doesn't mean Microsoft is going to remove the ability to talk to Cortana on PC. It just means the user experience Microsoft is working on is going to better enable both use cases. Microsoft already does this with Cortana on Android and iOS; the user interface works equally well with voice or text, and the same can be expected for Cortana on Windows 10.

If the experience looks and feels like a normal text conversation, like an SMS or iMessage conversation with a friend, people are much more likely to interact with Cortana. That's what Microsoft is hopefully working towards with the Cortana experience on Windows 10. Cortana is designed to help you stay productive, and typing is sometimes more natural than speaking.

Talk like people

Why contextual awareness matters

Turning to this week's announcement, Microsoft showed a demo of a woman interacting with Cortana on her smartphone. The flowing conversation has no less than 34 turns between the AI and user. This ability to converse with AI is the crucial step that has been missing so far with Siri, Cortana, Alexa, and Google. Currently, users need to think before they speak to phrase the request in such a way that the AI understands the command. "Right now you do the work for the system," said Klein.

All of that is about to change. The technology behind Semantic Machines, which is live and working code, lets the AI understand that when talking about "John" from a meeting, using the object pronoun "him" means John. And when talking about that meeting, the user can now ask, "What will the weather be like?", and the AI will understand that the user intends to learn about the weather at the location of the meeting — even if is 1,000 miles away.

Soon you will be able to ask Cortana to order a pizza, and the AI will come back with available stores.

This ability to understand intention through context is simple for humans to do but exceptionally difficult for computers. It's more than just a rule-based system, which is the current one that is already restricted by a siloed task; for example, when you plan a meeting, you can't also talk about the weather.

The ability to parse language with context also gets around the "skills problem." While having a dozen skills – services connected to an AI like Spotify – is nice, having thousands is overwhelming. Humans don't work that way because you must know in advance that the skill exists, that the skill is enabled, and then remember the skill command. None of that is smart. Contextually-aware systems that work on natural language processing can get around that. Soon you will be able to ask Cortana to order a pizza, and the AI will come back with available stores. Microsoft said. From there the user and AI can have a conversation about location, delivery or pickup, toppings and more, just like you would with another human.

As to when we'll see the technology within Cortana, Microsoft is being tight-lipped. However, it said that what has been shown is not some idealized, theoretical goal, and the code is live and working today. Microsoft sees Cortana as a single service that can be enabled on all devices, meaning what was shown this week will work on Windows 10, your smartphone of choice, or a Harmon-Kardon Invoke speaker, for example. However, if you were hoping for a new consumer-focused Cortana home speaker, Shuman has some bad news. "We're not going after more of that at this point," he said.

AI to make life easier

Cortana is about managing time

In talking with Shuman and Klein, it's clear Microsoft is making another big bet with Cortana and, in general, AI. Shuman was quite bullish on the contribution by Semantic Machines, noting their tech is like a "brain transplant" for Cortana. However, the popular notion of Cortana as a single app experience that is trying to do a "fast follow" of Alexa, Siri, or Google is incorrect. Microsoft's vision of AI is weaved around its core apps, services, and Windows OS. With the success of Microsoft 365 and Office 365 for productivity, email, Edge, Teams and more, many companies are already heavily steeped in Microsoft. AI that leverages all that data is simply something that Amazon, Google, and Apple cannot replicate.

AI is still very far from realizing its true potential.

Nonetheless, both Klein and Shuman pointed out that even with this breakthrough, AI is still very far from realizing its true potential. Klein said AI is "inches away from the starting line," while Shuman said there's still "a giant gap between what has been achieved and what can be achieved."

Microsoft's grand goal with Cortana is simple: free up more time in your life. With smart AI built around your work and personal lives, Cortana will someday soon be able to navigate and negotiate the complexities of the modern world. Contextually-aware conversations are the first big leap towards that achievement.

Daniel Rubino

Daniel Rubino is executive editor of Windows Central. He has been covering Microsoft since 2009 back when this site was called WMExperts (and later Windows Phone Central). His interests include Windows, Surface, HoloLens, Xbox, and future computing visions. Follow him on Twitter: @daniel_rubino.