Discovering the Real Value in the Audio Chain

Article By : Anne-Françoise Pelé

The emergence of silicon-based microphones has reshaped the audio landscape. But in the coming years, market research firm Yole Dévelopment is convinced that artificial intelligence will lead the market evolution and transformation.

Conversation is natural and that’s why it is becoming the primary interface for human-machine interaction. Voice-based personal assistants (VPAs) are growing in popularity in smartphones, smart speakers, smart watches, wireless earbuds, cars, smart TVs and their remote controls. There are even trash cans that now integrate voice recognition. The adoption curve will grow in the future, and the real value resides in high audio quality and an understanding of the environment around the microphone.

For Yole Développement (Lyon, France), audio is the next segment to be invaded by artificial intelligence (AI).

How AI has found its voice

Voice-based VPAs are today’s main driver of the audio industry. Based on traditional components of audio systems such as audio codecs, microphones, microspeakers, and audio amplifiers, they also use AI to compute and analyze voice data. Computing enables complex audio functions such as speech recognition and source localization. It is performed either in the cloud or at the edge in consumer devices. Analyzing, which demands high processing power and access to a lot of data, is executed in the cloud.

“The added-value of AI is for the natural language processing,” said Dimitrios Damianos, Technology & Market analyst in the photonics and sensing division at Yole. “The voice is a more natural way to interact with the machine. You don’t have to use a keyboard. You don’t have to use your hands. You just use your voice.” For that, however, a lot of processing needs to be done to understand what users are speaking about, their language and what they mean. “AI is adding the value of decoding and helping our communication with our devices.”

Asked about the fast penetration of VPAs, Damianos attributed it to their convenience and efficiency. But, of course, “What we believe and are seeing is that big tech companies like Google, Apple, Facebook, Amazon and Microsoft (collectively known as GAFAM) try to push these VPAs because there is a real value in the data they extract.”

For users, audio is more acceptable than images. They consider audio as being “less intrusive, so it’s a good way for the GAFAM, whose main business is about data, to gather data from people,” Alexis Debray, Technology & Market analyst in the MEMS and sensors division at Yole, continued. “Some companies are making their business with data while some others are making their business with privacy and set technology that ensures privacy for the user.” Apple, for instance, preaches privacy and makes it a powerful marketing asset.

The actual value for the big tech companies is to extract as much information as possible from the environment, meaning that the VPA listens not only to the users’ voice but also to their surroundings and understands their environment, Damianos said. For example, “if you are in your kitchen, the microphone can hear the sound of a knife against the counter and immediately understand that you are in the kitchen and suggest a recipe.” That’s conversational AI.

The next step after conversational AI could well be full awareness, where the virtual assistant, whether it’s a smart speaker or a smartwatch, communicates with the user just like a human being. Full awareness is conceptual and comes with question marks, said Damianos. “We don’t know the timeline yet, but maybe it will arrive after the conversational AI, in 5 to 6 years. It would depend on the progress of AI and the companies [evolving] in that area.”

While these always-listening systems could save lives in automotive human-machine interfaces, they also raise concerns about the user privacy protection. To prevent possible misuse, Debray stressed that data processing should be performed as quickly as possible and as close as possible to the microphone. “The closer to the microphone you do the treatment, the fewer possibilities of privacy leakage there are.”

Privacy comprises multiple dimensions, as the user may want to hide his or her gender, age, or emotions. Looking ahead, Debray said he is confident that players in the microphone, in the ASIC or the application processor sectors will develop technologies that guarantee privacy for the user. Microphones could then remove emotions from the voice and solely render audio data.

Yole analysts expect the GAFAM to continue to dominate because for now they are essential for analysis, but sensors manufacturers are clearly eager to incorporate AI at the edge and shunt audio analysis business away from the cloud. “Sensor manufacturers want to increase their revenues and take more of the audio pie,” said Damianos. “It is not a battle from the big companies’ side. It is a battle from the sensors companies’ side.”

Sensor companies are indeed pursuing strategies of diversification, “trying to move in the value chain and to be a little bit more integrated,” commented Alexis Debray.

In a recent interview, Matt Crowley, CEO of Vesper Technologies Inc., said the company was seeking to increase the intelligence of its piezoelectric MEMS microphones. “We believe that, in the future, we will have sensors paired with some artificial intelligence embedded in the sensor. It will be able to learn the way humans and animals use their senses — not just sight, hearing, taste, smell and touch, but also motion or temperature — to learn about their environment. Our long-term vision is that objects will use multiple types of bio-inspired sensors to learn about their environment and respond the best way possible.”

Infineon AG has also shifted its business model, from selling microphone dies to players such as Goertek and AAC to selling complete packaged MEMS microphones. And from being a MEMS microphone manufacturer to an integrated player doing manufacturing, packaging, testing and selling. “This is a change in strategy, […] it probably means that they see a move going on with VPAs, and they want to position themselves in this market.”

Similarly, Knowles, today’s leader with 39 percent shares of the MEMS microphone market, recently bought the MEMS microphone ASIC design division from Ams AG. It’s a way to bring in mixed signal circuit design IP and to counter the ever-rising competition from Chinese companies like Goertek and AAC.

According to Yole, the global consumer market for microphones, microspeakers and audio ICs is expected to grow at a healthy CAGR of 6.6 percent, from $14.1 billion in 2018 to $20.8 billion in 2024. Cheap, small and easy to integrate, microphones are widely adopted and reach very high volumes. “We speak about 6 billion microphones,” said Damianos. The microphone market, which currently accounts for $1.7 billion, is expected to increase at a CAGR of 3 percent to $2 billion in 2024.

The MEMS microphone market, which now represents about 70 percent of the total, will grow from $1.2 billion in 2018 to $1.6 billion in 2024. Key driving markets include the smartphones, smart speakers as well as hearables (e.g. wireless earbuds). “In the last couple of years, the smart speakers and hearable markets have experienced an explosive growth,” said Damianos. Basically, MEMS microphones in smart speakers will grow at a CAGR of 13 percent to 1.2 billion units in 2024. In wireless earbuds, they will expand at a CAGR of 29 percent, to 1.3 billion units in 2024.

In parallel, the microspeaker market, which currently amounts to $9.1 billion, is expected to grow at a CAGR of 3 percent to $10.9 billion in 2024, according to Yole.

“This might seem like a modest growth,” said Damianos. But, in 2018 and 2019, the smartphone market slowed down, likely because smartphones are becoming more expensive and users are waiting longer before upgrading them. “Before you would replace your phone every 1.5 year, now it is every 2.5 years, and it is increasing,” he continued. “We would expect the microphone and microspeaker markets to drop.” In fact, “the exploding growth of hearables and smart speakers offset the difference. The VPAs are driving the integration of microphones and microspeakers in all these devices.”