Hearing Seminars

CCRMA hosts a weekly Hearing seminar. All areas related to perception are discussed, but the group emphasizes topics that will help us understand how the auditory system works. Speakers are drawn from the group and visitors to the Stanford area. Most attendees are graduate students, faculty, or local researchers interested in psychology, music, engineering, neurophysiology, and linguistics. Meetings are usually from 11AM to 12:30 (or so, depending on questions) on Friday mornings in the CCRMA Seminar Room.

Recent Hearing Seminars

It's now time to take EEG out of the lab and into the real world. Our speaker on February 13 will talk about his efforts to build a mobile EEG recording device and validate its performance with an auditory task. Maarten De Vos received his PhD from KU Leuven (Belgium) and is visiting Stanford for a few months, before heading to a faculty position at Oxford.

His validation experiments are interesting because he describes them as a means of decoding attention. He's using P300 attention-modulated correlates to measure what people are attending to. Very interesting.

All of us our familiar with the basic goal of a hearing aid, amplify sound. But a more difficult issue is how do you figure out the right parameters to help a user? Getting the right feedback from a patient who doesn't understand what they are hearing is difficult. There are dozens (hundreds) of parameters in modern hearing aids. How do we take a patient's complaint that they can't hear in a restaurant and figure out what that means to their sound-processing needs? It's more than just turning up the volume.

A lot of work is done to understand the bottom-up pathways in the brain. This summer’s work looked at top-down influences. Just how do auditory imagination and priming affect what we hear? More importantly, can we see evidence of priming or auditory imaginations via either psychoacoustics or with EEG measurements? The answer is a tentative yes.

At this week’s Hearing Seminar, I want to describe several pilot experiments that were done over the summer. This work was part of the Telluride Neuromorphic Cognition workshop that is held every summer in Telluride, Co. It’s a rather scenic location, but is totally inundated with auditory perception nerds (and others) for three weeks of working workshop. Science in the mountains. Imagine that.

Animals throughout the animal kingdom excel at extracting individual sounds from competing background sounds, yet current state-of-the-art signal processing algorithms struggle to process speech in the presence of even modest background noise. Recent psychophysical experiments in humans and electrophysiological recordings in animal models suggest that the brain is adapted to process sounds within a restricted domain of spectro-temporal modulations found in natural sounds. We show how an artificial neural network trained to detect, extract and reconstruct the spectro-temporal features found in speech can significantly reduce the level of the background noise while preserving the foreground speech quality, improving speech intelligibility and automatic speech recognition along the way.

How is binaural hearing processed and represented in the brain? We have an almost magical ability to perceive the location of sounds. We know the basic cues (interaural level differences, and interaural time differences) but how does the eventual location get represented? Conventional wisdom is that is represented along a linear axis. But could it be represented a different way? I dare say that perceptual representations are the biggest piece of the neurological puzzle that we are missing….

We propose the product-of-filters (PoF) model, a generative model that decomposes audio spectra as sparse linear combinations of "filters" in the log-spectral domain. PoF makes similar assumptions to those used in the classic homomorphic filtering approach to signal processing, but replaces hand-designed decompositions built of basic signal processing operations with a learned decomposition based on statistical inference. When applied to speech, PoF discovers a source-filter representation of speech, despite its lack of any explicit prior knowledge about the mechanisms of vocalization. The PoF model can be used as a prior in more complicated models, permitting applications to problems such as dereverberation and bandwidth expansion.

Binaural beats, brain rhythms, and binaural hearing

Two tones with slightly different frequencies, presented to both ears, interact in the central auditory brain and induce the sensation of a beating sound. At low difference frequencies, we perceive a single sound, which is moving across the head between the left and right ears. The percept changes to loudness fluctuation, roughness, and pitch with increasing beat rate. To examine the neural representations underlying these different perceptions, we recorded neuromagnetic cortical responses while participants listened to binaural beats at continuously varying rate between 3 Hz and 60 Hz. Binaural beat responses were analyzed as neuromagnetic oscillations following the trajectory of the stimulus rate.

Potential users of audio production software, such as audio equalizers, may be discouraged by the complexity of the interface and a lack of clear affordances in typical interfaces. We seek to simplify interfaces for task such as audio production (e.g. mastering a music album with ProTools), audio tools (e.g. equalizers) and related consumer devices (e.g. hearing aids). Our approach is to use an evaluative paradigm (“I like this sound better than that sound”) and the use of descriptive language (e.g. “Make the violin sound ‘warmer.’”). To build interfaces that use descriptive language, a system must be able to tell whether the stated goal is appropriate for the selected tool (e.g.

The use of voice commands for human-computer interaction is becoming more prevalent thanks to the recent advancements of automatic speech recognition (ASR) technologies. In typical acoustic environments, audio captured by a microphone contains background noise, reverberation, and signals from interfering sources, making reliable speech capture a challenging problem. Some applications even require more than one user to interact with the system, e.g., gaming, which makes simultaneous speaker detection and localization crucial for enabling natural interactions. Distant multi-speaker speech capture often benefits from the use of microphone arrays that can provide enhanced speech signals using spatial filtering, or beamforming.