Wednesday, December 1, 2010

Why the obsession with intelligibility in speech processing studies?

There was a very interesting speech/language session at SfN this year organized by Jonathan Peelle. Talks included presentations Sophie Scott, Jonas Obleser, Sonia Kotz, Matt Davis and others spanning an impressive range of methods and perspectives on auditory language processing. Good stuff and a fun group of people. It felt kind of like a joint lab meeting with lots of discussion.

I want to emphasize one of the issues that came up, namely, the brain's response to intelligible speech and what we can learn from it. Here's a brief history.

2000 - Sophie Scott, Richard Wise and colleagues published a very influential paper which identified a left anterior temporal lobe region that responded more to intelligible speech (clear and noise vocoded sentences) than unintelligible speech (spectrally rotated versions of the intelligible speech stimuli). It was argued that this is the "pathway for intelligible speech".

2000-2006 - Several more papers from Scott/Wise's group replicated this basic finding but additional areas started creeping into the picture including left posterior regions and right hemisphere regions. The example figure below is from Sptsyna et al. 2006

2007 - Hickok & Poeppel again reviewed the broader literature on speech perception including lesion work as well as studies that attempted to isolate phonological-level processes more specifically. It is concluded, yes you guessed it, that Hickok & Poeppel 2000 were pretty much correct their claim of a bilaterally organized posterior temporal speech perception system.

2009 - Rauschecker and Scott publish their "Maps and Streams" review paper arguing just as strongly that speech perception is left lateralized and is dependent on an anterior pathway. As far as I can tell, this claim is based on (i) analogy to the ventral stream pathway projection in monkeys (note: we might not yet fully understand the primate auditory system and given that monkeys don't have speech, the homologies may be less than perfect), and (ii) the fact that the peak activation in intelligible minus unintelligible sentences tends to be greatest in the left anterior temporal lobe.

2010 - Okada et al. publish a replication of Scott et al. 2000 using a much larger sample than any previous study (n=20 compared to n=8 in the Scott et al. 2000) and find robust bilateral anterior and posterior activations in the superior temporal lobe for intelligible compared to unintelligible speech. See figure below which shows the group activation (top) and peak activations in individual subjects (bottom). Note that even though it doesn't show up in the group analysis, activation extends to right posterior STG/STS in most subjects.

So that's the history. As was revealed at the SfN session controversy still remains, despite the existence of what I thought was fairly compelling evidence against an exclusively anterior-going projection pathway.

Here's what came out at the conference.

I presented lesion evidence collected with my collaborators Corianne Rogalsky, Hanna Damasio, and Steven Anderson, which showed that destruction of the left anterior temporal lobe "intelligibility area" has zero effect on speech perception (see figure below). This example patient performed with 100% accuracy on a test of auditory word comprehension (4AFC, word to picture matching with all phonemic foils, including minimal pairs), and 98% accuracy on a minimal pair syllable discrimination test. Combine this with the fact that auditory comprehension deficits are most strongly associated with lesions in the posterior MTG (Bates et al. 2003) and this adds up to a major problem for the Scott et al. theory.

The counter-argument from the Scott camp was addressed exclusively at the imaging data. I'll try to summarize their main points as accurately as possible. Someone correct me if I've got them wrong.

1. Left ATL is the peak activation in intelligible vs. unintelligible contrasts2. Okada et al. did not use sparse sampling acquisition (true) which increased the intelligibility processing load (possible) thus recruiting posterior and right hemisphere involvement3. Okada et al. used an "active task" which affected the activation pattern (we asked subjects to press a button indicating whether the sentence was intelligible or not).

First and most importantly, none of these counter-arguments provides an account of the lesion data. We have to look at all sources of data in building our theories.

Regarding point #2: I will admit that it is possible that the extra noise taxed the system more than normal and this could have increased the signal throughout the network. However, these same regions are showing up in the reports of Scott and colleagues, even in the PET scans, and the regions that are showing up (bilateral pSTG/STS) are the same as those implicated in lesion work and in imaging studies that target phonological level processes.

Regarding point #3: I'm all for paying close attention to the task in explaining (or explaining away) activation patterns. However, if the task directly assesses the behavior of interest (which is not the case in many studies), this argument doesn't hold. The goal of all this work is to map the network for processing intelligible speech. If we are asking subjects to tell us if the sentence is intelligible, this should drive the network of interest. Unless, I suppose, you think that the pSTG is involved decision processes which is highly dubious.

This brings us to point #1: Yes, it does appear that the peak activation in the intell vs. unintell contrast is in the left anterior temporal lobe. This tendency is what drives the Scott et al. theory. But why the obsession with this contrast? There are two primary reasons why we shouldn't be obsessed with it. In fact, these points question whether there is any usefulness to the contrast at all.

1. It's confounded. Intelligible speech differs from unintelligible speech on a host of dimensions: phonemic, lexical, semantic, syntactic, prosodic, and compositional semantic content. Further, the various intelligibility conditions are acoustically different, just listen to them, or note that A1 can reliably classify each condition from the other (Okada et al. 2010). It is therefore extremely unclear what the contrast is isolating.

2. By performing this contrast, one is assuming that any region that fails to show a difference between the conditions is not part of the pathway for intelligible speech. This is clearly an incorrect assumption: in the extreme case, peripheral hearing loss impairs the ability understand speech even though the peripheral auditory system does not respond exclusively to intelligible speech. Closer to the point, even if it was the case that the left pSTG/STS did not show an activation difference between intelligible and unintelligible speech it could still be THE region responsible for speech perception. In fact, if the job of a speech perception network is to take spectrotemporal patterns as input and map these onto stored representations of speech sound categories, one would expect activation of this network across a range of spectrotemporal patterns, not only those that are "intelligible".

I don't expect this debate to end soon. In fact, one suggestion for the next "debate" at the NLC conference is Scott vs. Poeppel. That would be fun.

7 comments:

Hi Greg - Instead of contrasting intelligible vs. unintelligible speech in a group analysis, I suspect that it might be particularly revealing to compare the activation peaks for 'intelligible speech vs. baseline' to peaks for 'unintelligible speech vs. baseline.' There are different ways that you could examine dispersion differences between the two conditions (ie. anterior vs. posterior focus). Later - Julius

Yes that would certainly change the picture. I suspect that the reason the anterior focus is so robust is that the intell-unintell contrast at least partly subtracts out the phonological-level stuff that so strongly activates the pSTG. But the real question is why use these types of stimuli in the first place? At this point in the game I would hope we were beyond such global contrasts toward the goal of specifying the organization of more specific levels of processing? Can anyone tell me what computational process we might be isolating by comparing intelligible sentences with unintelligible sentences?

I'd love to see a Scott vs Poeppel debate. Another variation on the theme might be a "what does the dorsal stream do?" debate.

I agree that "intelligibility" is kind of a cluttered construct. We ran a meta-analysis earlier this year on speech-vs-nonspeech studies. To try to keep it relatively cognitively clean, we only used studies looking at sublexical speech stimuli, and then checked whether the specific nonspeech stimulus mattered and whether the task mattered. Of course they both did, and the full results supported the Hickok & Poeppel model better than others. In case you're interested, here's the link:http://www.ncbi.nlm.nih.gov/pubmed/20413149

I'm confused. Please help me out. I also have not been lucky enough to attend the sfn so may seem ignorant.

My interpretation of the literature discussed is that the Scott and Wise mafia highlight the ATL as important for semantic intelligibility (as in no semantic content (intelligible) vs. no semantic content (unintell), they argue that this ventral route is critical for accessing semantic info. They agree with the HP mafia that auditory discrim occurs in the post. regions but that the semantic processing is in the ATL.

If we use the neuropsych dual route model of processing in the dyslexia lit your patient who had a destroyed ATL but intact minimal pair and word to picture matching with all phonemic foils, including minimal pairs could have been using their phonological (direct) route and not relying on their semantic (indirect) route. If you used semantic foils did they perform worse than with phonological?

As I said, maybe I misunderstand, please advise if I have, but it seems like teh two mafia gangs should actually be the best of friends and work together to oust the real enemy? :)

I haven't asked Richard his opinion on this but Sophie's position is that it's all happening in the left ATL, including phonological stuff. So she should predict significant deficits in speech perception and comprehension in a patient with a lesion in the left ATL.

The patient with the left ATL lesion performed perfectly on a word-to-picture matching test will all semantic foils. The only deficits we noted were on sentence comprehension and naming.

Interesting re: deficits on sentence comprehension and naming.....perhaps a semantic hub after all? When and where do you plan to publish this patient.....there are lots of people who would be very interested.

Subscribe to Talking Brains

Blog Moderators

Greg Hickok is Professor of Cognitive Sciences at UC Irvine, Editor-in-Chief of Psychonomic Bulletin & Review, and author of The Myth of Mirror Neurons. DavidPoeppel, after several years as Professor of Linguistics and Biology at the University of Maryland, College Park, is now Professor of Psychology at NYU. Hickok and Poeppel first crossed paths in 1991 at MIT in the McDonnell-Pew Center for Cognitive Neuroscience where Hickok was a post doc, and Poeppel a grad student. Meeting up again a few years later at a Cognitive Neuroscience Society Meeting in San Francisco, they began a collaboration aimed at developing an integrated model of the functional anatomy of language. Research in both the Hickok and Poeppel labs is supported by NIDCD.