Main menu

Secondary menu

Machine Hearing: Audio Analysis by Emulation of Human Hearing

Date:

Fri, 10/14/2011 - 1:15pm - 2:30pm

Location:

CCRMA Seminar Room

Event Type:

Hearing Seminar

How should a machine listen to the world around us? Dick Lyon, now at Google, argues that we should learn from the human auditory system. It's had millennia to figure out the right features to recognize speech and environmental sounds. And if our ears can't recognize the difference between two sounds maybe the difference is not important.

Dick has made fundamental contributions to cochlear modeling, auditory representations, and now auditory machine learning. (And he invented and built the first optical mouse.) It's always cool to see what Dick is doing.

While many approaches to audio analysis are based on elegant mathematical models, an approach based on emulation of human hearing is becoming a strong challenger. The difference is subtle, as it involves extending such mathematically nice signal-processing concepts as linear systems, transforms, and second-order statistics to include the messier nonlinear, adaptive, and evolved aspects of hearing. Essentially, the goal is to form representations that do a good job of capturing what a signal "sounds like", so that we can make systems that react accordingly. Some of our recent experimental systems, such as sound retrieval from text queries, melody matching, and music recommendation, employ a four-layer machine-hearing architecture that attempts to simplify and systematize some of the methods used to emulate hearing. The peripheral level utilizes nonlinear filter cascades to model wave propagation in the nonlinear cochlea. The second level computes one or more types of auditory image, as an abstraction of what goes on in the auditory brainstem, and projecting to cortical sheets much as visual images do. The third level is where application-dependent features are extracted from the auditory images, abstractly modeling what likely happens in auditory cortex. Finally, and most abstractly, any appropriate machine-learning system is used to address the needs of an application, the brain-motivated neural network being a prototypical example. Each layer involves different disciplines, and can leverage the experiences of different fields, including hearing science, signal processing, machine vision, and machine learning.

Richard F. Lyon received the B.S. degree in engineering and applied science from California Institute of Technology in 1974 and the M.S. degree in electrical engineering from Stanford University in 1975. In his early career, he worked on a variety of projects involving communication and information theory, digital system design, analog and digital signal processing, VLSI design and methodologies, and sensory perception at Caltech, Bell Labs, Jet Propulsion Laboratory, Stanford Telecommunications Inc., Xerox PARC, Schlumberger Palo Alto Research, and Apple Computer; and he was a visiting associate for 15 years on the Computation and Neural Systems faculty at Caltech, where he worked on sensory modeling research and analog VLSI techniques for signal processing. Next he was chief scientist and vice president of research for Foveon, Inc., which he co-founded in 1997, where he led the advanced development of the Foveon X3 color image sensor technology. Dick presently works in Google Research on machine hearing; at Google, he also led the team that developed the camera systems for Street View and other applications.