I’m not sure I get it. When I watch it I hear “ba ba” even though he’s mouthing what looks like “ga ga,” and when I listen to it I obviously just hear “ba ba.” Hmm…Maybe bad kung-fu movie dubbing has spoiled aural illusions for me? *lol*

I actually study cognitive psychology and I remember learning about that illusion, if you will, in my undergrad cognitive psych class. It has to do with the visual cues that we use to understand language. A lot of people never realize how much our visual system plays in auditory processes. The fact that our brain actually processes something completely different when a visual cue is added is interesting indeed. If I remember correctly, that phenomenon is a shortcut that our brain uses to quickly process the sound. Our brains need to utilize shortcuts all the time due to the ridiculous amount of information that it takes in at all times. We would be completely overwhelmed if we had to go through every bit of information without finding shortcuts to process it.

I can not for the life of me remember what it was called when it was introduced to us, but I do remember being very intrigued that day in class. Point in case though, our brains are awesome!

It’s just “bah-bah” not matter what for me. Visual cues play very little part in *my* understanding of spoken language…unless I am reading lips. Aural cues are principal for me and I am not often thrown by visual distractions.

I do find it very interesting that other folks might be severely affected by this “cognitive dissonance.” It seems especially important in this campaign season.

If you watch the next video, it explains the McGurk Effect. Basically he spliced a video of him saying Gaa Gaa with an audio track of Baa Baa. Our eyes make us think we are hearing Gaa Gaa or Daa Daa due to visual cues, but our ears alone hear the real audio.

And I need to learn that WP uses brain-dead sanitizing rules; let’s see if this works better:

…

which states:

Most adults (98%) think they are hearing “DA” – a so called “fused respons” – where the “D” is a result of an audio-visual illusion. In reality you are hearing the sound “BA”, while you are seing the lip movements “GA”.

and also points out that

IF you are listening through poor speakers, the illusion might not work. Try connecting external headphones. The effect will also not work if the images are not streaming right, or in other words not in synch with the sound.

so if you cannot hear the difference, you either belong to the 2% minority, or you need a better computer.

Anyone who is compelled by nature or habit to listening to sounds very carefully (such as many musicians do to music, for example) are already aware of the similarity between certain sounds. What’s more, many people who have such a habit often find it extremely difficult to make any sense out of vocalized words in music – they can’t tell you at all what a singer in a song “says”, only that a vocalist is present. These people are also most often prone to requiring a speaker to repeat herself to obtain what was said.

I have long suspected that people may be divided into two groups (although there may be others): those who are verbal-language dominant who can relatively easily assimilate speech, and those who take all sound more or less “raw”.

Which is best? One might think that having a “processing filter” to immediately grasp speech is a positive selection factor. On the other hand, such an “illusion” demonstrates that those who aren’t dominated by such a filter aren’t as easily fooled by whatever they hear.

Wow! That is so freaky. Whenever I look, it is clearly and distinctly “da da da” but when I close my eyes it’s “ba ba ba,” this is really freaking me out, ha! Thanks for posting this, I never realized sound was so… subjective

Okay, so I watched this a dozen times before reading the comments and couldn’t figure out what the crap was supposed to be so amazing about it. After reading about what I am supposed to see/hear, now I get it. I do the same thing with those Magic Eye pictures. I don’t see anything until someone tells me what to see. Is my brain messed up?

I really can’t tell the difference. That said I can’t really tell what he’s saying in any situation. I can’t make out the consonant sounds.

However I’ve always had an audio language “comprehension” problem. I often find myself not able to understand sung lyrics on the radio. It takes me far longer to comprehend foreign language when it is spoken versus written. I always find much more enjoyment watching movies or television with cc on because I always pick up much more of what’s going on.

Now another phenomenon I’ve noticed with myself was in school I always had better retention of a subject when I had a teacher who did spoken lectures in front of the class versus other teaching styles like book work or printed notes. But I suspect that had more to do with interaction versus audio/ language learning skills.

today and one of the illusions in the challenge deals with this very phenomenon.

*Note: if you happen to be using one of the most advanced web browsers like Safari (or better yet, the latest nightly of WebKit, which is the only browser that will pass the Acid3 test (http://acid3.acidtests.org/)), it’s going to say you have an outdated browser. If you have the debug menu turned on, you can simply change your useragent to something like IE7 and the flash will operate fine ;p

awesome, the first time i watched the video with no sound ( forgot to turn on the speakers) and i couldn’t figure out what he was saying by watching his lips, not that i usually can but it gives an idea anyways.

OK. Odd. If I don’t watch it for a while, and concentrate on the fact he’s saying Bah-bah and listen with my eyes closed, I can just make it out that way. But if I watch and hear, I hear the Dah-Dah, and then immediately listen but not watch, I still hear the Dah-Dah. Maybe it’s a persistence of mental image… or something.

I hear “da da” or “tha tha” watching the video, but “ba ba” when I’m not looking, so I guess I’m one of the “98%”. I think that’s a pretty cool effect. I also am finding it interesting that way more than 2% of the responses here are people claiming that they don’t hear a difference, so I wonder what that says about the people that read sites like this, and how their brains work. Hmmmmm….

What with my bad hearing and clouded eyes, I think he was speaking two languages. I heard Da (Russian for yes) and Bah (English English for an unacceptable situation as in Bah-Humbug). It was the same eyes open or closed.

Huh. It sounded like “tha tha” to me, both ways. I had to listen several times before I started to be able to make out “ba ba” with my eyes closed. I’m still not totally certain whether I’m actually hearing the difference or I’ve just convinced myself that I should be hearing it, though.

Watching I definitely always hear “la la”. Not watching, I now find I can decide in advance whether I’m going to hear “la”, “ba” or “da”, and my brain pretty much co-operates. So I have some control over my internal reality anyway.

I suppose that if you look at his mouth and see that he’s not using his lips your visual input is supposed to make make the ambiguous sound into a “D” consonant. It doesn’t sound like a “D” to me, nor a “B” but some kind of muddled, gutteral “B,D,T” -ish sound. It’s the same if I look at the face or not. Guess my visual processor and aural processor don’t network so well.

One time my wife’s family was sitting around the table having a conversation when her aunt said, “Just a minute, let me get my glasses. I can’t hear you.”

When we moved from New Jersey to California when I was a kid, we made a vacation out of it and drove across the country. Somewhere in New Mexico, my sister (19 at the time) was looking out at the vast wilderness and said, “You ever notice how there’s always a valley between the mountain ranges?”

We all just looked at her as she tried to better articulate her point. To this day she can’t really explain it.

With my eyes open he sounds like slightly inarticulate, but not overtly so considering today’s youth fed up on American Idol.
With my eyes closed he sounds like BA’s biggest fan finding consolation in his mantra.

Can’t help thinking that the German band Trio might have had a deeper meaning with their old hit: Da Da Da

Geoff (on 20 Jun 2008 at 5:02 pm)
“It didn’t work for me either but I’m a film editor and am used to lip sync issues so maybe that’s it.”

My situation exactly. I edit video (and it’s audio) professionally and heard “Bah, Bah.” each time and saw his lips saying “Da, Da.” or something. The “B” sound starts with the lips closed (“P” does too). So it looked weird to me from the get-go. He might not even be actually making any sound at all.

This puts Geoff & me in the 2% category. I’m subconsciously looking for lip sync errors I suppose… Working in broadcast TV does that stuff to ya, it’s no problem, really. I don’t watch TV, I study it. The Wikipedia page says the effect is “robust” but it doesn’t work for me at all. It’s the professional in me always seeing with the unblinking eye and hearing with a critical ear. I don’t miss much. It’s hard to pull the wool over an unblinking eye, you might say…
Rich

That’s not so freaky as it sounds. People who have partial hearing losses can make up for it to some degree by lip-reading, often without realizing it. When we put on our glasses, we interpret it as being able to “hear” better because we can pick up on visual cues that fill in for the missing sounds (typically sibilants, because we lose the high frequencies first). I have about a 50% hearing loss in my left ear, and I didn’t realize I was compensating until I got my first pair of glasses.

Yes, your brain does all kinds of processing on sound. One of the more famous cases is the “missing fundamental”. If you play a song that is heavy on the bass on your itty-bitty car speakers, well, those speakers aren’t going to be able to output the low frequency fundamental. However, our brains hear the overtones, and fill in the missing frequency.

@John Kennel. The big bang depends on your temporal perspective I suppose. If you look at it as time’s arrow moves back, instead of forward, it would definitely look like a contractive event. Apparently however, thanks to recent research, and even articles in Sci Am, perhaps it’s also possible to view the big bang in our temporal perspective as well. As the universe becomes progressively expansive and cold, perhaps hundreds of trillions of years later, a quantum fluctuation in a specific area, can cause a new universe to suddenly collapse out of it, and if successful, an inflationary event will proceed, as time in this space, can move in either direction (from our perspective, which supposedly occurs half the time, we see it moving one way, it would be the reverse in half the other created universes). Anyway, still unclear about many things here, but hope that helps push the discussion you attempted to start.

I’m not in the 2%. I’ve been hearing impaired for many years. I can actually “hear” it three different ways: 1) sound off – ga ga; 2) sound + video, a Japanese sounding – dtha dtha; 3) sound only – ba ba. The video might make a good screening test for hearing loss.

Part of why it is hard to make sense of somebody over a phone: we depend more than we realize on the visual cues of language in addition to the auditory ones. Listening on a cell phone hands-free doesn’t stop this effect. And I think we listen more attentively to the phone rather than the radio because we are expected to respond. The radio is not any clearer: witness “‘Scuse me, while I kiss this guy!” it’s just that we aren’t expected to reply coherently.

Sounds exactly the same to me but the computer you’re using is more than 50% of how you will perceive it. My computer has handled youtube differently since they went to their new version and reduced screen options.

I wonder what supposed category it puts me in if I’m not ‘fooled’ by the illusion, but can still see enough of the glitch to tell where one might be tripped up? Oddly enough, I think I might know why I’m not falling for it. As a long-time fan of video games, characters were once animated in two or three frames (mouth open/closed/somewhere in-between) as text scrolled. When games began using real audio voices, for a while, visuals didn’t catch up, meaning real voices often didn’t match the mouth movements. Perhaps growing up, I simply learned to disassociate and not rely too much on the visual cues?

Others have noted that a career or hobby with video editing might impart a similar ability, and I think perhaps watching a lot of animated material might do the same.

a few hours ago i first watched the video with headphones on, i couldn’t hear ANY difference whether my eyes were opened or closed, i heard “bah bah” in both cases, i had to read the comments to figure out what the illusion was supposed to be.

i’ve just tried again, but on loudspeakers this time and… (don’t laugh) with earplugs (there’s a music festival today and a band is playing very loud just down my window, so i’ve been wearing them for 2 hours). Well, now i can hear the illusion: if i look at the video i clearly hear “dah dah”, if not, “bah bah”, funny thing is that was not intentional (watching it with earplugs).

As it has been suggested by other commenters it’s probably directly related to the viewer’s audition. Ear plugs don’t attenuate sounds linearly they tend to act like a low pass filter, muffling the upper end of the spectrum but letting pass low frequencies, which is sort of what happens when aging (the max frequency you can hear becomes lower and lower with the years).

For those of you who heard “bah bah” in both cases: if you have earplugs handy you should give it a try

Phil should set up a poll to investigate this illusion further :p
i bet the average age of those who ear “bah bah” in both cases (without impairing their hearing artificially that is) is lower than those for whom the illusion works.

I can hear the difference, but so what? I think that our minds “sort out” what we hear so any ambiguity between, say “dah” and “bah” is smoothed out by context. It’s like reading – good readers don’t look at every word, they anticipate what is likely to come next through contextual clues. Why shouldn’t our hearing do the same? All we have in the video is a single, ambiguous syllable. But people very rarely say a single ambiguous syllable without other clues supplied by context, tone, pitch and expression.

Tom Marking, that Sagan/Matrix vodeo was incredibly funny! Expectedly, the lip synch was off a little, but still fairly close. And even the hand motions matched!
************************
For the record, for this thread’s video, I heard “Thah-thah, thah-thah, thah-thah” while looking, “Bah-bah, bah-bah, bah-bah” while not looking. If I closed my eyes & opened them during the video, the sounds would switch depending upon what I could see. (Still “thah” for sighted, “bah” for unsighted).

I wonder whether your either your language skills or your native language has anything to do with what you specifically hear? I am a native Finnish speaker, but I am quite as comfortable with English – I consider myself multilingual.

In any case, an interesting phenomena. Thanks to the commenter waaay above for the wiki-link.

Ba-ba, Ba, ba. That was what we heard, right. I could tell the lips were making completely different motions. It was like watching a low-budget dubbed movie, where the time and motion of lips to sound are completely unrelated.
But seeing his lips move didn’t change what I heard at all.

Perhaps not surprisingly, there do not appear to be many psycholinguists or phoneticians around here.

Well, I’m one of the former breed. The McGurk effect has been well known and thoroughly described. It is an exquisite example of the brain’s use of different sensory modalities to extract information from the environment, and use that information to generate a mental representation of some event. It depends, among other things, on a feature of our perceptual systems known as categorical perception: The physical differences between the phonological instantiation of BA versus PA are in fact linear and continuous, taking place across about 20 milliseconds of “voicing” (the onset of vocal chord vibration). But we do not perceive the differences between BA and PA as continuous; there is a sharp cut-off in perception, with no middle ground. Interestingly, in the 70’s Patricia Kuhl at Univ. Washington showed that this perceptual mechanism which is clearly crucial for language is also present in chinchillas, suggesting that spoken human language is built out of phylogenetically old parts.

McGurk described this sensory integration effect, showing how visual stimuli interact with our automated phonetic category assignments. We know now that this must depend on very fine-grained integration of activity in multiple brain regions that do not have direct links to one another, such as the hippocampus and the primary visual cortex. Individual differences in the wiring of such networks might help explain why some people don’t experience the effect in the demo, but we don’t know much about this yet.

@Yojimbo: “What do you hear? It reminds me of the ambiguous sound the Japanese use for our R, L and D.”

Huh? Japanese has a perfectly normal D sound, the same as English. It doesn’t have an exact equivalent of the English R or L; it does have a sound *not* found in (American) English that can make do as a way to express R or L. But that’s its own clear sound, and isn’t “ambiguous” in the least.

Ah, thank you Santiago for delivering a virtual haircut to my office. That was quite enjoyable, but I still can’t see the difference

BTW, I’m in the Ga Ga, Ba Ba camp, though I was confused for a bit trying to find some meaning in the ‘word’ being said. Then I started reading the comments and finally understood what I was supposed to be noticing. – g^2

P.S. Glad you’re having a good time a TAM6 – I finally got around to twittering. . .

Hi folks..
I’m a member of the 2% minority.. I’ve got my theory on this.

– I’m not a native english speaker, so it sounds much more like ” papa”, said with a foreign accent to me ( a word, in french that means ” daddy”, so.. I can relate the sound to a “real” allday word, so no matter what, I hear this word.. The video seems to be “gaga” wrongly dubbed to me, but I still hear the sound “papa”

– I’m a musician, so the sound comes first to me, and the video can’t trick me, as the visual perception isn’t as strong as audio. My ears dominate my eyes.

But I must admit that , on a longer example, as you posted some weeks ago ( featuring the choir singing what could be a Purcell or Haendel’s Anthem with fake subtitles), I can easily be tricked: My english skills aren’t achieved enough to “fight” the subtitles.

So greetings from France to everyone,from a geek-lady passionated by Languages, etymology, and anything that involves sound.

Interesting… but I was confused because phil said the guy was saying a single word… da da da da da da or ba ba ba ba ba ba…. I wasn’t considering that string of sounds a “word”…. now I see that he was just repeating a single sound…. makes more sense! 😛

Strangely, if I try it with one eye closed and then the other I “hear” it ever so slightly differently. But only on one eye. The other eye is the same as both eyes open. I wonder if that has anything to do with hand-dominance?

Meh, I don’t get it. No matter how I look (or don’t look) or turn my head, it never sounds like anything even remotely English. It seems there’s lots of “Da-Da” going on….. hurray how nifty and exciting

“Huh? Japanese has a perfectly normal D sound, the same as English. It doesn’t have an exact equivalent of the English R or L; it does have a sound *not* found in (American) English that can make do as a way to express R or L. But that’s its own clear sound, and isn’t “ambiguous” in the least.”

Not to belabor an off-topic point (and I see you are a fluent speaker), but, isn’t it true that in Japanese there is no “D” sound per se at all? Or any other pure consonant, for that matter – only consonant/vowel combinations (except the “n” sound). Their sound of “dah” is exactly like ours, but I don’t believe there is an equivalent of “dee” – is there a “doo”?

Anyway, what I meant by ambiguous is that it would be very hard for an English speaker to hear a difference in the Japanese pronunciation of “ray”, “lay”, or “day” (or, in this case, “rah”, “la” and “dah”).

@Yojimbo & Traveller:
Funnily enough I first saw this on a Japanese science program (Tameshite Gatten?) a few weeks ago and my Japanese wife and I saw “ma” and “da” but heard “ba” (The Japanese person on the screen was voicing “ma”)

Santiago: Thank you for freaking me the hell out. I’ve heard 3D sound demos before but that was just too scary, especially when they put the bag over your head.

I’m a radio broadcast engineer by trade and have always been audio oriented, so I heard the sound the same way, with and without watching the video. Blind friends have told me that I’m more audio oriented than most sighted people. So I think the difference is how audio oriented the person is.

I have a severe hearing impairment and have taken lip reading classes which due to my visual difficulties I can’t do well.

But what I do know, and this is very much like it, is that this is what is called “speech reading” where I use both my minimal hearing and focus on the face trying to lip read which then adds together and suddenly helps me to start getting words that I otherwise would miss.

I’ll have to look up the McGurk effect to see if it’s speech reading or something else.