How Microsoft could beat Siri and Google Now: A modern Microsoft Bob

If Google Now is the chirpy personal assistant, always volunteering information before being asked, Microsoft’s future in intelligent virtual assistants will be more in tune with a butler, quietly hovering and making suggestions where necessary.

In fact, don’t expect Microsoft to develop a competitor to either Google Now or Apple’s Siri, Microsoft’s director of Bing search, Stefan Weitz, told PCWorld in an interview. Instead, individual products within Microsoft will be able to tap into the vast collection of data that Microsoft has amassed through its partnerships with Facebook, Foursquare, LinkedIn, Yahoo, and many more—far, far beyond what Google, which has favored an independent approach, can achieve, Weitz said.

“I think [Microsoft Bob] will reemerge, but with a deep bit more sophistication,” Gates said. “We were just ahead of our time, like most of our mistakes.”

No, Microsoft Bob isn’t coming back with snazzy new hipster glasses updated for 2013. But a Big Data version of the Bob concept—personal assistance precisely when you need it—is definitely coming to Microsoft’s repertoire.

At the center of it all is Bing, the company’s public-facing search engine. Behind the scenes, Microsoft has been busy developing what it calls its Satori engine, named for the first step on the Buddhist path to enlightenment. Satori’s goal, Microsoft says, is to build the “world’s largest repository of knowledge” for Bing to tap into, always there to provide assistance when asked. Behind the scenes, Satori will collect and collate information from the Web, organizing it within Bing, and using individual applications as the portals for delivery to users.

MicrosoftBing’s knowledge engine stretches across the company—or it will.

There is a sea of information that surrounds us every day—our location, the businesses and landmarks around us, the proximity of our friends, and other data points. In general, that information will remain hidden from the user until he or she either queries Bing, or performs an action that triggers an app. According to Weitz, Microsoft's approach will have two advantages: the breadth of data it can tap into, and the passive, reactive approach of the Satori engine. Google Now is simply too pushy in comparison, Weitz would argue.

So how does Satori actually manifest itself? Weitz described a Microsoft demo where Bing monitors a chat session between two users, and quietly steps in when appropriate. “As you’re talking in IM, it’s analyzing the utterances,” Weitz said. “For something like ‘Hey, do you want to see a movie?’, it takes that utterance and automatically does the query for you.” Weitz said that piece of technology could be rolled into products within two years.

Weitz said this is all in the very early stages of a process of “infusing” Microsoft’s products with Bing’s knowledge. “We want to make it natural,” Weitz said. “So that’s why we’re making this [Bing technology] available to individual product teams so they can choose to include it and make it natural for users.”

The same capabilities are being made available to third-party apps as well. At Build 2013, Microsoft said it would begin making its Bing knowledge accessible to third-party developers later this year. ”I believe that apps will have eyes, they will have ears, they will have mouths,” Pall said.

And better brains, too.

MicrosoftLook for all these technologies to gain intelligence through Bing.

Blessings from Ballmer, Gates

Make no mistake: Weitz and Pall aren't rogue agents, waxing hypothetical about pet projects. This is the direction where Microsoft is heading.

In recent weeks, the two public faces of Microsoft, chief executive Steve Ballmer and chairman Bill Gates, have both indicated that a significant chunk of Microsoft’s future will lie in collecting, collating, and analyzing the personal and professional data it collects from its users. Yes, Microsoft will be chasing Apple and Google. Yes, there will be privacy implications. And to see the full benefits, users will have to buy into the Microsoft ecosystem and its collection of software and services. If you’ve shared information with another service on the Web, however, chances are that Microsoft has already built a profile around you.

”Our machine learning infrastructure will understand people’s needs and what is available in the world, and will provide information and assistance,” Ballmer wrote. “We will be great at anticipating needs in people’s daily routines and providing insight and assistance when they need it. When it comes to life’s most important tasks and events, we will pay extra attention. The research done, the data collected and analyzed, the meetings and discussions had, and the money spent are all amplified for people during life’s big moments.”

That wasn’t all. Last week, chairman Bill Gates hosted Microsoft’s Research Faculty Summit 2013, where he, too, pointed the way toward a future of intelligent assistance.

”As everyone gets essentially what we’d call the personal agent—it’s been talked about for decades and now really is possible—we see where you’re going, we see your calendar, we see your various communications, some of those communications we can actually look at the tags, look at the speech, try to be helpful to you in your activities,” Gates said. “I think that we will be more connected, so that when somebody wants to find a gift of a certain type, or take a trip in a certain way, that there will be a closer match."

Is Microsoft Bob making a comeback?

Unfortunately, Microsoft has a rather checkered past when its comes to intelligent agents. Adam Cheyer, the co-founder of Siri Inc. (later acquired by Apple, whose technology forms the foundation of Apple’s Siri service) shared with me a few swings and misses: the Microsoft Agent technology that first appeared in Windows Vista with its animated assistant, Merlin; the Office Assistant, popularly known as Clippy, which appeared in Microsoft Office 97 for Windows; Microsoft’s Comic Chat, a graphical IRC program that assigned “emotions” based on the content of their speech; and Microsoft’s Personal Assistant for Scheduling System, a “virtual secretary."

”Most of their commercial systems haven’t had much success yet, and I haven’t seen technology from either of these two acquisitions emerge as products for the consumer,” Cheyer said in an email. “I last tried Bing Mobile’s speech interface about three years ago and I found it quite bad at the time (perhaps it’s improved since), compared to Siri, Google, Nuance or (at the time an independent company) Vlingo.”

And then, of course, there was Microsoft Bob, the metaphorical interface to existing Microsoft technologies (such as a calendar) that Microsoft released in 1995. Less of an intelligent agent and more of a user-friendly guide to the PC, Bob was championed by Melinda French, who later married Bill Gates. (A spokesman for the Bill and Melinda Gates Foundation turned down our request for an interview.)

Gates inflamed geek imaginations when he suggested at the Faculty Summit that Microsoft Bob could rise from the grave. “I think it will reemerge, but with a deep bit more sophistication,” Gates said. “We were just ahead of our time, like most of our mistakes.”

With the hindsight of nearly two decades, it’s difficult to see how Bob would have succeeded. “There was no Internet, there was no way for it to get smarter,” Weitz said. “It was agent-based, with a defined ruleset. Today, if you did that, you could make a much more interesting product. But no, I can confirm, Microsoft is officially not bringing back Microsoft Bob.”

A lack of cohesion?

The problem, if there is one, is that Microsoft’s approach might be too piecemeal, with no single, go-to application that users can ask for assistance. This type of user experience is especially important in the mobile space, says Mohammed Abdoolcarim, who worked on the Siri product teams within Siri Inc. and Apple.

If Microsoft wants to become a serious player in the mobile space, it needs to develop a single product like Siri to compete with rivals, Abdoolcarim, now a product manager for activity tracker developer Misfit Wearables, said in an email from Vietnam.

”What we have seen is that search engines have become the de facto starting place for navigating through the massive amounts of data on the Web. And that made sense,” Abdoolcarim wrote. “On the Web, users wanted to find web content that was most relevant to their query.

”On mobile, the problem is different,” Abdoolcarim said. “Now users are now in the state of decision making. Where can I find the best burrito? When should I leave home to get to the movies on time? Who else is coming to the party tonight? In this state, users are on the move and need to get things done. And they need to get it done fast. Which means typing out a query on your mobile device is out of the question. By the time you finish typing your query, the moment has passed and the friends you’re hanging out with have already made a decision on which movie to go watch. This is where a voice-based natural language interface (in the case of Siri) and contextual information (in the case of Google Now) come in to make the decision-making process frictionless.”

To be fair, there’s nothing stopping Microsoft from making the Bing search application the focal point of the mobile experience. And Weitz defended Microsoft’s stance toward allowing users to dictate the terms of the search, rather than pushing information at the user like an “annoying uncle,” he said.

According to Microsoft’s own tests, a technology like Google Now is only accurate 88 to 89 percent of the time, and users will assign a so-called “penalty of trust” when the system gets it wrong. “But once [the user] initiates a request, it radically enhances the accuracy level of predictive technology,” Weitz said.

And, really, it’s not just Apple and Google that Microsoft needs to worry about. Nuance Communications owns the Dragon Naturally Speaking software, and it's bought companies like semantic technology provider Cognition (whose technology was licensed to Microsoft for Bing). Perhaps not surprisingly, then, Nuance has its own plans to develop a platform-agnostic intelligent agent.

Dubbed “Wintermute,” the technology was shown off at the Consumer Electronics Show in Las Vegas this past January. Nuance envisions all sorts of connected devices, from ATMs to gas pumps to phones to TVs, powered by its cloud-based intelligent agent.

”Over time, our partners will be able to deliver more and more form factors with a common personal assistant in the cloud, increasing the personalized cross-device experience,” a Nuance spokesman said. “As a platform—unlike a consumer product—there will not be an official ‘launch.' In fact, Nuance Swype’s Back Up & Sync capabilities are a first iteration of the Wintermute platform, where a user’s personalized language model follows them from one device to the next.”

Whoever gets intelligent agent technology right stands to both lock in users to an ecosystem, as well as collect an enormous amount of useful data. According to Gates (who should know), developing an improved intelligent agent is a prize that would be “very large in a commercial sense."

Computers originated as an idealized version of Frankenstein’s monster: extremely dumb and extremely docile, but when properly directed, some of the most powerful tools around. For years, they only acted under our commands. Then they began to talk to one another. We told them to remind us of important events. We began to ask them questions, in the form of search engines. They learned our intent, and the patterns of our speech. And now, we’re just beginning to teach them to be helpful.

Numerous pieces of technology have rewritten markets and altered society: the mobile phone, the DVR, digital cameras, and the MP3, among others. Intelligent agents could fall into that category. For when your phone or your PC pops up a bit of relevant information, especially without asking, an intelligent agent is indistinguishable from magic, Weitz said.