Why Tech Giants Are So Desperate to Provide Your Voice Assistant

Executive Summary

Why are the top tech companies in the world spending billions of dollars investing in voice assistants and smart speakers that lose them money every quarter? Because over time, more and more agency will be granted to voice assistants to simply execute tasks on behalf of the user. The consumer will not necessarily care how the task is fulfilled, just that it gets done. If an agent is making decisions on behalf of a consumer, you can understand why all of the tech giants want to be the provider of that assistant. Google can ensure that Google Assistant users are steered to its search and YouTube video services while also making a dent in online shopping. Amazon can expand beyond commerce into search and attention while accelerating the adoption of its digital content properties. Samsung could get beyond its global lead in smartphone sales to includes voice-enabled online services. The big tech companies are highly attuned to the current competitive advantages and those of their rivals, and see voice as an opportunity to both shore up their defenses while potentially gaining new, lucrative ground.

Emma Innocenti/Getty Images

Why are the top tech companies in the world spending billions of dollars investing in voice assistants and smart speakers that lose them money every quarter? Amazon reportedly has 10,000 employees working on Alexa. Tear-down analyses suggest that Google Home Mini is sold at a loss every time it is discounted, which is frequently. Samsung has introduced a dedicated button on its phones for Bixby which consumers regularly remap to use Google Assistant instead. Microsoft is no longer trying to compete with Alexa or Google Assistant, but is still investing in Cortana.

Each of these companies, and others, have different reasons for their continued investment. Some are attempting to protect a dominant franchise, such as online commerce in Amazon’s case, and search advertising for Google. Others are trying to break into spaces where they have been excluded, such as digital content distribution, display advertising, search, and commerce. A couple have both scenarios in mind.

You can only understand the voice platform wars by first recognizing that voice assistants, specifically, represent both a platform and user interface (UI) shift comparable to the web and smartphones. The key difference is that these new platforms are neither based on open standards, nor on relatively open access to consumers. Voice assistants introduce a proprietary intermediary into all digital consumer interactions. This scenario both excites and frightens the leading tech companies that carved out enviable positions in the earlier web and smartphone platform wars.

Platform and UI Shifts Go Hand-in-Hand

Voice assistants represent the third key UI and technology platform shift of the past three decades, following the web in the 1990’s and smartphones about 10 years ago. Each successive UI shift changed the way humans interacted with and accessed digital content. Web pages gave us “click,” where we surfed using our mouse and activated buttons and hyperlinks. Smartphones introduced “touch,” “swipe,” and “pinch” to billions of consumers and replaced web pages with apps. Both of these transitions required consumers to learn a new language for interacting with technology. The shift to voice doesn’t require any training. Users simply “speak” as they do naturally.

Insight Center

Each of these UI changes was accompanied by a new technology platform. The World Wide Web was built on the back of the internet, and PC proliferation enabled web pages to be easily accessed. Smartphone mobile operating systems such as iOS and Android were important developments, but the app economy also relied on the introduction of cloud computing for efficiently delivering content along with regular feature updates and performance enhancements. Voice computing relies on artificial intelligence for speech recognition and natural language understanding. It is also being used to dynamically improve user experience.

Platform Shifts, Market Share Dominance, and the Five A’s

The web and mobile introduced entirely new ways to conduct business, and enhanced the efficiency of old methods. Specific companies came to dominate in different areas. Voicebot subsequently introduced the 5 A’s framework for evaluating value segment dominance: Access, Acquisition, Authority, Attention, and Agency.

The first of these, Access, can be thought of as distribution for digital content. You may recall the mantra popularized by Bill Gates that “content is king” on the internet. No single company came to dominate all content on the web largely because content ownership is distributed widely, content formats are varied, and the open standards of the internet enable anyone to become a publisher. However, YouTube did come to dominate user-generated video, and more recently, Netflix has established unrivaled market share in professional video. To a lesser extent, Spotify has had success dominating the music space. Each followed a similar strategy of a two-sided marketplace bringing together the largest number of suppliers and consumers.

YouTube started on the web and successfully made the transition to mobile. Netflix and Spotify both required additional innovations around ubiquitous broadband and WiFi to make these same transitions. So, this segment has evolved with the platforms.

Acquisition is the second “A.” When you think about online shopping, there is little argument that Amazon came to dominate this sphere in the U.S. and in many western nations. In China, Alibaba and its assorted properties holds this title, but we can point to these two clear winners despite the fact than many companies also sell online, some exclusively. Amazon and Alibaba also followed the marketplace model of aggregating supply and demand.

Authority is a soft asset, but a powerful one. Google came to dominate authority in both the web and mobile eras by becoming the dominant search engine. If you wanted to answer a question, you would “Google” it. At one time, The Encyclopedia Britannica, Walter Cronkite, or The New York Times might have been the leading authority of their day. The digital era gave us Google, with more than 90% share of all search traffic.

Each of these first three segments is about asserting control in the market over content distribution, commerce, and information. The next A is about controlling how people use their time. Attention used to be something the top television show, news broadcast, or most popular newspapers could claim as an asset. Consumers would tune in or read their content, and that offered an opportunity to advertise. It is easy to conclude that Apple and Google, with their dominant mobile operating systems, along with Apple again and Samsung as the world’s leading smartphone makers, were the winners of the mobile era. However, another winner was assuredly Facebook. Its properties came to command more user attention than any other company by a large margin.

Playing Defense Against a New Intermediary

Each of these segments is again up for grabs in the new voice era. Voice assistants offer easy Access to content. They are being used to Acquire goods, and are a new source of Authority, as they answer billions of questions annually. They are also diverting Attention that previously went to smartphone interactions to new voice interaction properties.

This means that each of the winners of the previous web and mobile platform wars has existing territory they must protect. Will voice interactive video where users can control playback and even choose the content sequence eventually displace passive video consumption on YouTube? Will a voice assistant from Google or Apple funnel user purchases through Amazon.com fulfillment, or will they instead go direct to online sellers? Will voice assistants from Amazon, Apple, Microsoft, and Samsung solely consult the Google Knowledge Graph to answer questions? How will consumer habits and voice assistant biases steer consumer attention away from Facebook, Instagram, and WhatsApp into other diversions?

The change introduced by voice is even more disruptive than the shift from web to mobile. That transition involved users diverting their time to a new technology channel which also had access to the previous channel through mobile web. Much of the content on the web and mobile is completely inaccessible through voice assistants. In addition, activities on web and mobile platforms were user-directed. The consumer made choices. Increasingly, voice assistants make choices for consumers. That is where the fifth “A” comes into play.

Agency is what all of the big winners of the earlier tech platforms fear most. Voice assistants reserve agency for making choices about where answers are sourced (Authority), and can heavily influence content sourcing (Access), such as steering people toward Prime Video or YouTube. They also can order from multiple sources (Acquisition) that are not Amazon.com. And, they introduce new sources of interactions that displace consumer time with media (Attention).

Voice assistants are an intermediary. The web has no equivalent structural intermediary, and mobile app stores place limited constraints based on app certification, but don’t overtly steer consumers to a particular source. Voice assistants do overtly steer users, presenting the “best result” for any given request. Users can ask for results from specific sources, but that takes extra effort and forethought, and the voice assistant may not ultimately make that content available. For example, there is no Amazon.com shopping through Google Assistant, and Alexa will favor results from Prime Video over other services.

Steering and availability are clear issues that can undermine the position of a winner of an earlier tech platform, but agency will have far greater implications. Voice assistants are designed to help simplify users’ lives. Over time, more and more agency will be granted to voice assistants to simply execute tasks on behalf of the user. The consumer will not necessarily care how the task is fulfilled, just that it gets done. Some of this will be user-directed, such as Google Duplex making a restaurant reservation. Some of it will be proactive, as a voice assistant will notice a favorite item that the user regularly purchases is available on discount and it will simply be ordered and shipped without any explicit instruction.

The leading technology giants of our era must play defense against voice assistants provided by other organizations in order to protect their hard-won gains in web and mobile. Failure to protect their franchise could lead to a swift demise.

Playing Offense by Controlling the Agent

If an agent is making decisions on behalf of a consumer, you can understand why all of the tech giants want to be the provider of that assistant. Google can ensure that Google Assistant users are steered to its search and YouTube video services while also making a dent in online shopping. Amazon can expand beyond commerce into search and attention while accelerating the adoption of its digital content properties. Samsung could get beyond its global lead in smartphone sales to include voice-enabled online services.

Platform shifts represent opportunities for new players to create their own segment dominance. Apple and Samsung took advantage of the shift to smartphones to create seemingly unassailable franchises at the expense of Nokia and Blackberry. Facebook established the always-available Attention segment that didn’t exist before we had compact, connected computers with high bandwidth digital access with us all of the time.

Voice Assistants Have No Boundaries

Voice assistants are not constrained to a device, and that makes them more powerful due to their ubiquity. The web was originally tethered to your PC. Smartphones are mobile, but become a single, personalized source of information, connectivity, and amusement. Voice assistants are often most recognizable through smart speakers, but in fact are on at least ten times more smartphones and are rapidly expanding into appliances and a variety of other voice access points, such as digital media players, refrigerators, and smart watches.

This lack of a device boundary means that voice assistants can spread more easily, can provide different types of value than the previous platforms, and can offer an entry point for new providers that don’t have assets from the previous eras. The biggest tech companies in the world recognize this because they exploited the changes brought about by previous platform shifts. They are also highly attuned to their current competitive advantages and those of their rivals, and see voice as an opportunity to both shore up their defenses while potentially gaining new, lucrative ground.

Welcome to the voice era.

Bret Kinsella is founder, editor, and research director of Voicebot.ai, the leading source of research, news, and analysis for the intersection of voice and AI technologies. He was named Commentator of the Year for 2018 on voice technologies by the Alexa Conference board, is routinely listed as having the top-rated podcast in the industry, and is a frequent keynote speaker in the U.S. and Europe. He’s a former executive at Accenture, Sapient, and in IoT software development.