Cutting through the clamor—how the brain helps us understand spoken words in noisy settings

Humans are exquisitely skilled at perceiving spoken words, even when speakers’ voices are intermittently overwhelmed by noise, as happens in the din of construction sites or on busy urban streets. Now, in a study conducted in a group of patients preparing for brain surgery, UC San Francisco scientists have discovered an unexpected mechanism the brain uses to seamlessly compensate when speech sounds are obscured by noise.

The research team monitored neural activity during listening tasks in a group of epilepsy patients awaiting surgery, using recording devices placed directly on the surface of the brain. As reported in the December 20, 2016 issue of Nature Communications, the resulting neural recordings captured the real-time dynamics of this perceptual “filling in,” which takes just tenths of a second, and also showed that a region outside the brain’s canonical speech areas plays a critical role in this process.

The group found that the part of the brain most deeply involved in speech perception responded to missing speech sounds as if those sounds were actually present. But most intriguingly, the researchers discovered that a brain region separate from main speech-processing areas somehow “predicts” which word a listener will hear when that word is partially masked by noise, well before that noise has even begun to be processed by auditory areas.

In the new research, Matthew Leonard, PhD, assistant professor of neurological surgery and a member of the UCSF Weill Institute for Neurosciences, and colleagues worked with five patients about to undergo surgery to treat epilepsy that was not manageable with medications.

To locate the anatomical origins of these patients’ seizures for surgery, and to create surgical plans that would protect crucial brain areas, flexible panels containing 256 recording electrodes had been placed on the surface of either the right or left side of the brain. These electrode arrays provided dense coverage of a region known as the superior temporal gyrus (STG), which is crucial to speech processing, a recording arrangement that has proven valuable in previous research on speech in the UCSF laboratory of neurosurgeon-scientist and senior author Edward Chang, MD, professor of neurological surgery.

It has been known since the 1970s that when critical speech sounds that distinguish one word from another – the “s” and “k” sounds that distinguish faster from factor, for example – are excised and replaced by noise (such a stimulus can be represented as “fa#tor”), listeners will nonetheless report hearing a complete word, a phenomenon called “phoneme restoration.”

According to Leonard, for stimuli like fa#tor, where only two actual English words related to the stimulus exist, phoneme restoration is a “bistable” auditory illusion, somewhat analogous to well-known visual illusions like the “duck/rabbit” drawings that shift between two perceptual interpretations. When listeners hear fa#tor, they report hearing either faster or factor, even though neither word is truly present in the stimulus.

When the patients listened to various bistable stimuli, recordings from the STG were consistent with whichever word they reported hearing: if they perceived factor, for example, the part of the STG normally activated by “k” sounds emitted a signal, even though no “k” sound was actually present; likewise, when they perceived faster, the STG region corresponding to “s” sounds was activated.

These responses occurred less than two tenths of a second after the noise-obscured gaps in the stimuli began to be processed—the same time frame as when the difference between the actual words faster or factor was processed—which provides the beginnings of an answer to a perennial question in speech perception, Leonard said.

“One of the oldest debates in the field is whether there’s a ‘top-down’ signal that actually changes the listener’s perception ‘online,’ in real time, or whether this is achieved by some sort of decision-making process that rapidly arrives at an interpretation after the missing sound segment has been processed,” Leonard said. “Our data seem to support the former idea.”

Surprisingly, the patients’ word choices were unaffected when noise-masked words were embedded in sentences that would seem to strongly favor one choice over another, a technique called “semantic priming.” For example, hearing “On the highway, he drove the car much fa#ter” would seem to bias a listener toward hearing “faster,” but the researchers found that the patients were just as likely to say they heard “factor.”

Since phoneme restoration was essentially instantaneous, and because semantic priming had so little effect, the research team wondered whether brain areas other than the STG might somehow be contributing to the listeners’ perception.

The group was surprised to find that an area toward the front of the brain was selectively active about half a second before the STG signals associated with phoneme restoration were seen. This activity actually predicted which word patients would report hearing, suggesting that this region somehow helped to drive that perception.

“Whether you hear a bistable stimulus as factor or faster on a given trial seems to depend on random fluctuations in the brain’s state at that moment, something you really don’t have any control over,” Leonard said. “We don’t have a definitive idea of what this frontal signal is yet, but we’ll be exploring that question in future research.”

Taken together, said Leonard, the new results show that “there are brain mechanisms that are constantly working behind the scenes to make sure we don’t get tripped up every time there’s a sound that could prevent us from understanding speech.”