Amazon Alexa and the Search for the One Perfect Answer

Share

If you had visited the Cambridge University Library in the late 1990s, you might have observed a skinny young man, his face illuminated by the glow of a laptop screen, camping out in the stacks. William Tunstall-­Pedoe had wrapped up his studies in computer science several years earlier, but he still relished the musty aroma of old paper, the feeling of books pressing in from every side. The library received a copy of nearly everything published in the United Kingdom, and the sheer volume of information—5 million books and 1.2 million periodicals—inspired him.

It was around this time, of course, that another vast repository of knowledge—the internet—was taking shape. Google, with its famous mission statement “to organize the world’s information and make it universally accessible and useful,” was proudly stepping into its role as librarian to the planet. But as much as Tunstall-­Pedoe adored lingering in the stacks, he felt that computers shouldn’t require people to laboriously track down information the way that libraries did. Yes, there was great pleasure to be had in browsing through search results, stumbling upon new sources, and discovering adjacent facts. But what most users really wanted was answers, not the thrill of a hunt.

This article is adapted fromTalk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think, by James Vlahos, to be published in March by Houghton Mifflin Harcourt.

Houghton Mifflin Harcourt

As tools for achieving this end, search engines were almost as cumbersome as their book-stuffed predecessors. First, you had to think of just the right keywords. From the long list of links that Google or Yahoo produced, you had to guess which one was best. Then you had to click on it, go to a web page, and hope that it contained the information you sought. Tunstall-­Pedoe thought the technology should work more like the ship’s computer on Star Trek: Ask a question in everyday language, get an “instant, perfect answer.” Search engines as helpful librarians, he believed, must eventually yield to AIs as omniscient oracles.

This was a technological fantasy on par with flying cars, but Tunstall-­Pedoe set about making it a reality. He had been earning money as a programmer since the age of 13 and had always been particularly fascinated by the quest to teach natural language to machines. As an undergraduate, he had written a piece of software called Anagram Genius, which, when supplied with names or phrases, cleverly rearranged the letters. “Margaret Hilda Thatcher,” for instance, became “A girl, the arch mad-hatter.” (Years later, author Dan Brown used Anagram Genius to generate the plot-­critical puzzles in The Da Vinci Code.) Now, sequestered in the library, Tunstall-Pedoe began building a prototype that could answer a few hundred questions.

Two decades later, with the rise of voice computing platforms such as Amazon Alexa and Google Assistant, the world’s biggest tech companies are suddenly, precipitously moving in Tunstall-­Pedoe’s direction. Voice-­enabled smart speakers have become some of the industry’s best-selling products; in 2018 alone, according to a report by NPR and Edison Research, their prevalence in American households grew by 78 percent. According to one market survey, people ask their smart speakers to answer questions more often than they do anything else with them. Tunstall-­Pedoe’s vision of computers responding to our queries in a single pass—providing one-shot answers, as they are known in the search community—has gone mainstream. The internet and the multibillion-­dollar business ecosystems it supports are changing irrevocably. So, too, is the creation, distribution, and control of information—the very nature of how we know what we know.

In 2007, having weathered the dotcom crash and its aftermath, Tunstall-­Pedoe and a few colleagues were close to launching their first product—a website called True Knowledge that would offer one-shot answers to all kinds of questions. At the time, theirs was still a heterodox goal. “There were people in Google who were completely allergic to what we were doing,” Tunstall-­Pedoe says. “The idea of a one-shot answer to a search was taboo.” He recalls arguing with one senior Google employee who rejected the notion of there even being such a thing as a single correct reply. The big search engines, despite having indexed billions of web pages, did not possess a deep understanding of user queries. Rather, they engaged in glorified guesswork: You typed a few keywords into the Google search bar, and the company’s PageRank system returned a long list of statistically backed conjectures about what you wanted to know.

To demonstrate that True Knowledge’s one-shot ambition was possible, Tunstall-­Pedoe and his small team in Cambridge had developed a digital brain consisting of three primary components. The first was a natural-language-­processing system that tried to robustly interpret questions. For instance, “How many people live in,” “What is the population of,” and “How big is” would all be represented as queries about the number of inhabitants of a place.

The second component of the system amassed facts. Unlike a search engine, which simply pointed users toward websites, True Knowledge aspired to supply the answers itself. It needed to know that the population of London is 8.8 million, that LeBron James is 6'8", that George Washington’s last words were “ ’Tis well,” and so on. The great majority of these facts were not manually keyed into the system; that would have been too arduous. Instead, they were automatically retrieved from sources of structured data, where information is listed in a computer-­readable format.

Finally, the system had to encode how all of these facts related to one another. The programmers created a knowledge graph, which can be pictured as a giant treelike structure. At its base was the category “object,” which encompassed every single fact. Moving upward, the “object” category branched into the classes “conceptual object” (for social and mental constructs) and “physical object” (for everything else). The higher up the tree you went, the more refined the categorizations got. The “track” category, for instance, split into groupings that included “route,” “railway,” and “road.” Building the ontology was a grueling task, and it swelled to tens of thousands of categories, comprising hundreds of millions of facts. But the structure it provided allowed new information to be sorted like laundry into dresser drawers.

Related Stories

The knowledge graph encoded relationships in a taxonomic sense: A Douglas fir is a type of conifer, a conifer is a type of plant, and so on. But beyond simply expressing that there was a connection between two entities, the system also characterized the nature of each connection: Big Ben is located in England. Emmanuel Macronis the president of France. This meant that True Knowledge effectively learned some commonsense rules about the world that, while blazingly obvious to humans, typically elude computers: A landmark can exist only in a single place. France can have only one sitting president. Most exciting for Tunstall-­Pedoe, True Knowledge could handle questions whose answers were not explicitly spelled out beforehand. Imagine somebody asking, “Is a bat a bird?” Because the ontology had bats sorted into a subgroup under “mammals” and birds were located elsewhere, the system could correctly reason that bats are not birds.

True Knowledge was getting smart, and in pitches to investors, Tunstall-­Pedoe liked to thumb his nose at the competition. For instance, he’d Google “Is Madonna single?” The search engine’s shallow understanding was obvious when it returned the link “Unreleased Madonna single slips onto Net.” True Knowledge, meanwhile, knew from the way the question was phrased that “single” was being used as an adjective, not a noun, and that it was defined as an absence of romantic connections. So, seeing that Madonna and Guy Ritchie were connected (at the time) by an is married to link, the system more helpfully answered that, no, Madonna was not single.

Liking what they saw, investors cranked open the venture capital spigot in 2008. True Knowledge expanded to around 30 employees and moved to a larger office in Cambridge. But the technology didn’t initially catch on with consumers, in part because its user interface was “an ugly baby,” Tunstall-­Pedoe says. So he relaunched True Knowledge as a cleanly designed smartphone app, one available on both iPhones and Android devices. It had a cute logo—a smiley face with one eye—and a catchy new name, Evi (pronounced EE-vee). Best of all, you could speak your questions to Evi and hear the replies.

Evi debuted in January 2012, a few months after Apple launched its Siri voice assistant, and shot to No. 1 in the company’s app store, quickly racking up more than half a million downloads. (Apple, apparently piqued by headlines such as “introducing evi: siri’s new worst enemy,” at one point threatened to pull the app.) Tunstall-­Pedoe was swamped with acquisition interest. After a series of meetings with suitors, True Knowledge agreed to be bought out. Nearly everyone would get to keep their jobs and stay in Cambridge, and Tunstall-­Pedoe would become a senior member of the product team for a not-yet-released voice computing device. When that device came out in 2014, its question-­answering abilities would be significantly powered by Evi. The buyer was Amazon, and the device was the Echo.

Jacob Burge

One-shot answers were unfashionable back when Tunstall-­Pedoe started programming at Cambridge. But that was no longer the case by the time the Echo came out. In the era of voice computing, offering a single answer is not merely a nice-to-have feature; it’s a need-to-have one. “You can’t provide 10 blue links by voice,” Tunstall-Pedoe says, echoing prevailing industry sentiment. “That’s a terrible user experience.”

As the world’s largest tech firms wised up, they began retracing many of True Knowledge’s steps. In 2010, Google acquired Meta­web, a startup that was creating an ontology called Freebase. Two years later, the company unveiled the Knowledge Graph, which boasted 3.5 billion facts. That same year, Microsoft launched what would become known as the Concept Graph, which grew to contain 5 million entities. In 2017, Facebook, Amazon, and Apple all acquired knowledge-­graph-building companies. Lately, many researchers have begun designing autonomous systems that crawl the web for answers, stocking ontologies with new facts far quicker than any human could.

The bull rush makes sense. Market analysts estimate that, by 2020, up to half of all internet searches will be spoken aloud. Lately, even the trusty old librarians of onscreen search have been quietly switching to oracle mode. Google has been steadily boosting the prevalence of featured snippets, a type of one-shot answer, in the desktop and mobile versions of its search engine. They get pride of place above the other results. Let’s say you search for “What is the rarest element in the universe?” Right there, under the query box, is the response: “The radioactive element astatine.” According to the marketing agency Stone Temple, Google served up instant answers for more than a third of all searches in July 2015. Eighteen months later, it did so more than half the time.

The move toward one-shot answers has been just slow enough to obscure its own most important consequence: killing off the internet as we know it. The conventional web, with all of its tedious pages and links, is giving way to the conversational web, in which chatty AIs reign supreme. The payoff, we are told, is increased convenience and efficiency. But for everyone who has economic interests tied to traditional web search—businesses, advertisers, authors, publishers, the tech giants—the situation is perilous. To understand why, it helps to quickly review the economics of the online world, where attention is everything.

Companies want to be found; they want their ads to be seen. So, since the earliest days of the internet, they have worked to master the mysterious art of search engine optimization, or SEO—tweaking keywords and other elements of sites to make them appear higher in the search rankings. To guarantee a prime location, companies also fork over money directly to the search services for paid discovery, purchasing small ads that run atop or beside the results.

When desktop search was the only game around, companies jockeyed to be one of the top 10 links listed; people often don’t scroll any lower than that. Since the rise of mobile, they’ve raced to get into the top five. With voice search, companies face an even more daunting challenge. They want to grab what’s known as position zero—to supply the one-shot answer that appears above all the other results. Position zero is critical because it is most often what gets read aloud. And it is often the only thing that gets read, according to Greg Hedges, a VP at the marketing agency RAIN, which advises brands on their conversational AI strategy. “If you want to be visible in a few years, you have to make sure that your website is optimized for voice search,” he says.

Suppose you run a sushi restaurant and have many competitors nearby. A user asks his voice device, “What’s a good sushi place near me?” If your restaurant isn’t the one the AI regularly chooses first, you’re in trouble. There is, of course, a verbal equivalent to scrolling down: After hearing the top option, the customer might say, “I don’t like the sound of that. What else is nearby?” But that requires work, which people avoid when they can.

LEARN MORE

Reaching position zero requires a wholly different strategy than conventional SEO. The importance of putting just the right keywords on a web page, for instance, is declining. Instead, SEO gurus try to think of the natural-language phrases that users might say—like “What are the top-rated hybrid cars?”—and incorporate them, along with concise answers, on sites. The hope is to produce the perfect bit of content that the AI will extract and read aloud.

For now, there is no paid discovery for voice search. But when it inevitably arrives, the internet’s ad economy will be turned upside down. Because voice oracles dispense answers one at a time, they offer less real estate for advertisers. “There’s going to be a battle for shelf space, and each slot should theoretically be more expensive,” Jared Belsky, the current CEO of the digital marketing agency 360i, told Adweek in 2017. “It’s the same amount of interest funneling into a smaller landscape.” This may prove especially true in retail environments such as Amazon, where a purchase-ready consumer is right on the other end of the smart speaker. With voice, the goal is to summit Everest—to get the top result—or die trying.

What if your product isn’t a hybrid car or a spicy tuna roll but knowledge itself? Publishers are already uncomfortably dependent on the big tech companies for most of their traffic, and thus much of their advertising income. According to the analytics company Parse.ly, Google searches currently account for about half of all referrals to publishers’ sites; shared links on Facebook account for a quarter. One-shot answers could seriously restrict this traffic. For instance: I am an Oregon Ducks fan. In the past, I’d go to ESPN.com the morning after a game to find out who won. Once there, I might click on another story or two, giving the site a few fractions of a cent in ad revenue. If I were feeling especially generous, I might even sign up for a monthly subscription. But now I can simply ask my phone, “Who won the Ducks game?” I get my answer, and ESPN never sees my traffic.

Maybe you care about ESPN, a major business in its own right, having its traffic siphoned off; maybe you don’t. The point is that a similar dynamic could affect a huge number of content creators, from the whales to the minnows. Consider the story of Brian Warner, who runs a website called Celebrity Net Worth. On the site, curious visitors can punch in the name of, say, Jay-Z and find out—thanks to research by Warner’s employees—that the rapper is worth an estimated $930 million. Warner claims that Google started harvesting answers from his site even after he explicitly denied the search giant’s request for access to his company’s database. Once this started, he says, the amount of traffic that actually reached Celebrity Net Worth plummeted by 80 percent, and he had to lay off half of his staff. “How many thousands of other websites and businesses has Google paved over?” he asks. (A Google spokesperson declined to comment specifically on Warner’s version of events; she noted, however, that site administrators can use the company’s developer tools to prevent their pages from appearing in featured snippets.)

When voice AIs read an extracted bit of content, they often do credit the source. They may offer a verbal attribution or, if the device in question has a screen, a visual one. But name-­dropping doesn’t pay the bills; publishers need traffic. With a typical smart speaker, the chances that a user would somehow supply that traffic are slim. Google and Amazon’s workarounds are clumsy: A user can go to the smartphone companion app for her Home or Echo, find the result of the search, and click a link to go to the content creator’s site.

A user could go to that trouble. But why bother when she already has the answer she sought? As Asher Elran, a web traffic expert and CEO of Dynamic Search, put it in a blog post back in 2013, one-shot answers rig the game in Google’s favor. “As websites, we expect to compete for those ranks by using SEO and providing interesting content,” he wrote. “What we do not expect is the answer to the questions appearing to the searcher before we get a chance to impress them with our hard work.”

When Tunstall-Pedoe began working on what would become True Knowledge, he got the impression that Google opposed providing one-shot answers. Although some employees undoubtedly felt that way at the time, statements from the company’s leaders make clear that the long-term plan was always to build an oracle. “When you use Google, do you get more than one answer?” Eric Schmidt asked in a 2005 interview, more than a decade before he stepped down as chair. “Of course you do. Well, that’s a bug … We should be able to give you the right answer just once.”

For years, technological obstacles kept Schmidt’s goal at a safe remove. This came with certain advantages. Under Section 230 of the Communications Decency Act, a 1996 law that governs freedom of expression on the internet, online intermediaries cannot be held responsible for content supplied by others. As long as Google remained a mere conduit for information, rather than a creator of that information—a neutral librarian rather than an all-knowing oracle—it could likely avoid a blizzard of legal liabilities and moral responsibilities. “Part of the reason why Google liked 10 blue links was because they weren’t determining what was true or false,” Tunstall-­Pedoe says.

But the company’s don’t-­kill-­the-­messenger positioning is much harder to accept in the voice era. Say you click on a search result and end up reading an article from the San Francisco Chronicle. Google is clearly not responsible for the content of that article. But when the company’s Assistant delivers an answer to one of your questions, the distinction becomes murkier. Even though the information may have been extracted from a third-party source, it feels as though it’s coming straight from Google. As such, the companies serving up replies to voice searches gain great power to decree what is true. They become overlords of epistemology.

Danny Sullivan, Google’s public liaison for search, touched on this hazard last year in a blog post about featured snippets. Until recently, he explained, users who asked “How did the Romans tell time at night?” had been getting an absurd one-shot answer: sundials. This was a no-­consequence mistake, and Sullivan assured the public that Google was working to prevent such gaffes in the future. But it isn’t difficult to imagine a similar blunder with bigger ramifications, particularly as more and more Americans embrace voice search and the notion of the infallible AI oracle. Past one-shot answers have falsely claimed that Barack Obama was declaring martial law, that Woodrow Wilson was a member of the Ku Klux Klan, that MSG causes brain damage, and that women are evil. Google willingly fixed these whoppers, explaining that it had not authored them—that the mistakes had been automatically extracted from shoddy websites.

Giving people a way to check sourcing offers some insulation against misinformation run amok. But it is difficult to imagine a user of Echo or Home going to the trouble of regularly logging into the companion app; the extra effort goes against the whole hands-free, no-look ethos of voice computing. And the verbal attributions, when they exist, are typically vague. A user might be told that an answer came from Yahoo or Wolfram Alpha. That’s akin to saying, “Our tech company got this information from another tech company.” It lacks the specificity of seeing the name of a reporter or media outlet; it also omits mention of the evidence used to arrive at a conclusion. When the source is a company’s own knowledge graph or other internal resource, the derivation becomes even more opaque: “Our tech company got this information from itself. Trust us.”

SIGN UP TODAY

The strategy of delivering one-shot answers also implies that we live in a world in which facts are simple and absolute. Sure, many questions do have a single correct answer: Is Earth a sphere? What is the population of India? For other questions, though, there are multiple legitimate perspectives, which puts voice oracles in an awkward position. Recognizing this, Microsoft’s Cortana sometimes gives two competing answers to contested questions rather than just one. Google is considering doing a version of the same. Whether or not these companies wish to play the role of Fact-Checker to the World, they’re backing themselves into it.

The command that big tech companies have over the dissemination of information, particularly in the era of voice computing, raises the specter of Orwellian control of knowledge. In places such as China, where the government heavily censors the internet, this is not just an academic concern. In democratic countries, the more pressing question is whether companies are manipulating facts in ways that benefit their corporate interests or the personal agendas of their leaders. The control of knowledge is a potent power, and never have so few companies attained such dominance as the portals through which the vast majority of the world’s information flows.

The rest of us, meanwhile, may be losing the very skills that allow us to hold these gatekeepers to account. Once we become accustomed to placing our faith in the handy oracle on the kitchen counter, we may lose patience with the laborious—and curiosity-stoking, and thought-­provoking—hunt for facts, expecting them to come to us instead. Why pump water from a well if it pours effortlessly from your faucet?

Tunstall-­Pedoe, who left Amazon in 2016, acknowledges that voice oracles introduce new risks, or at least worsen existing ones. But he has the typical engineer’s view that the problems caused by technology can be solved by—you guessed it—more and better technology, such as AIs that learn to suppress factually incorrect information. If online oracles one day get good enough to make a place like the Cambridge University Library obsolete, he imagines that he would feel nostalgic. But only up to a certain point. “I might miss it,” Tunstall-­Pedoe says, “but I’m not sure that I would go back there if I didn’t need to.”