The course will explore the tone combinations that humans consider consonant or dissonant, the scales we use, and the emotions music elicits, all of which provide a rich set of data for exploring music and auditory aesthetics in a biological framework. Analyses of speech and musical databases are consistent with the idea that the chromatic scale (the set of tones used by humans to create music), consonance and dissonance, worldwide preferences for a few dozen scales from the billions that are possible, and the emotions elicited by music in different cultures all stem from the relative similarity of musical tonalities and the characteristics of voiced (tonal) speech. Like the phenomenology of visual perception, these aspects of auditory perception appear to have arisen from the need to contend with sensory stimuli that are inherently unable to specify their physical sources, leading to the evolution of a common strategy to deal with this fundamental challenge.

Taught By

Dale Purves

M.D.

Transcript

In Lesson 3, we're going to talk about the perception of intensity that is what we hear as loudness in empirical terms. And to remind you why this is important, let's go back to this slide that I showed you before to indicate why the question remains. Okay, if we don't hear sounds in terms of their physical descriptors in sound signals, why is that? Why would we not want to do that? Why can't we do that and what's the alternative? What is the way we are generating our perception of loudness or our perception in the next lesson that we'll talk about in terms of pitch. And it's this diagram that I showed you before that's worth going over again. So this is in terms of a plucked string where a force is applied to the string. It vibrates, there's a series of vibratory modes. The sound disturbance in the atmosphere heads toward the, towards the listener, but that sound signal diminishes as a function of distance. It hits objects in the environment and is changed. Thereby is mixed with concurrent sounds that are coming from other sources, which are just routine in any normal, natural environment, and the signal that comes to the ear is conflating all of the factors that, basically, a listener needs to in some sense understand, and behave correctly. The mechanical force that's being applied to the strings, the resident properties of the sound source object, all of these effects of the environment, the interference of concurrent sounds. All of these things are conflated, mixed up, entangled in the stimulus, this sound signal that's reaching the listener's ear. And somehow the listener has to get around that conflation and sort out what's actually happening in a way that informs behavior. And you can see, I think, from a diagram like this that this is a very fundamental problem, it's not just limited to audition, it's absolutely fundamental in vision and other sensory modalities and is a big challenge for workers in those fields as well. And to look at this from the perspective of what is the experience that we normally have and does the accumulation of that experience by trial and error predict, through its frequency of occurrence, what we actually hear. We need a database of natural sounds. And a database that we and a lot of people use is a database of speech sounds that is briefly referred to as the TIMIT Speech Corpus. That stands for Texas Instrument MIT, Massachusetts Institute of Technology database and it was put together a number of years ago. It is used by lots of people, as an excellent, carefully constructed source of natural sound signals, speed sounds in particular. So, here is a sentence taken from that corpus. There are many sentences spoken by many different speakers in American English, to give a strong baseline of speech information. And that time varying signal, if you take a piece of it, 100 millisecond segment of that uttered sentence, this is a blow up in this diagram of that little piece, and you can see it happens to be a vowel sound, a periodic piece of the signal in the same way that we talked about last time. There are periodic pieces that are generally speaking the vowel sounds and aperiodic pieces that are generally speaking the consonants sounds, and of course there's silences and concurrent noise in between. You can take that snippet, that sound signant illustrated here in the second panel and look at its spectrum. Its spectrum, remember we talked about this before, is done by a simple mathematical trick called a Fourier analysis. It's taking this snippet and asking in terms of the distribution of frequencies plotted against amplitudes, where is the power in that sound? What are the frequencies at which the power exist? Now I showed you this before, I showed you this spectrum in talking about the generation of speech in us, sort of diagrammatic way. And you can see that the real thing is a bit more, quite a bit more complicated. But these peaks here, these individual peaks here, are the frequencies, the harmonic series, of a vowel. Remember, this happens to be a vowel snippet and this would be the first formant, the second formant and third formant here, you see, well, these are not so pretty as when diagrammed as I had before. And you can see as well, there's lots of noise in this. I mean it's not a perfectly simple example, it's kind of a mess but that's what people have to deal with in natural speed sounds and the things that I talked about already there. They're just there in a messier way than implied in a simple diagram. Again, the point of showing you all this is that there are very good databases that are out there, the TIMIT is one that you can use to analyze the frequency of occurrence of, in this case, the intensity of sounds in a natural source signals like speech. So what's the evidence then that what we hear is being generated empirically by the frequency of occurrence of intensities in normal speech? There is a whole series of evidence that supports this idea, but I'm just going to show you one example. So let's look at this first graph here. This is a plot of the level above threshold. Now the physical intensity of speech measured in decibels that we talked about last time, the physical metric that we use to plot intensity against the loudness of the speech that we're representing here in arbitrary units. And you can see that this is not a linear phenomenon, linear would be a diagonal across the plot, but what happens is that we're hearing more intensely the the physics would indicate for sounds that are relatively less intense and we're hearing as less intense from a linear relationship from sounds that are more intense at this more intense end of the plot. So, this is well known, this is data that was determined decades ago and has been established many times since. Here in this middle panel is the frequency of occurrence of 20 milliseconds snippets taken from the corpus. Thousands, many thousands of little snippets asking, what is the frequency of occurrence in normal speech? These are again, people uttering many senses in a normal conversational manner. What's the frequency of occurrence of intensities in those little snippets? And here's the plot that you see of the intensities. And as you would expect, remember I told you that conversational decibel levels were around 60 decibels. So average conversation is in this range of 60 decibels. Now the question that's asked in this series Is well. If you take the frequency of occurrence of intensities in B and ask can you explain this function between what we hear, loudness that we hear and the decibel intensity of sound does it mimic? Does it explain? Does it rationalize this function, and the answer is that it does pretty well. So again, here we're plotting, taking this information and asking the question, if we generate based on a ranking of intensities, if we generate a curve similar to the curve here. Do we get the same answer? And the result is that we do pretty well, so again this is the rank of the frequency of the occurrence of the intensity of sound signals in normal speech, and you can see that when those are plotted the rank is plotted against the sound pressures taken from this data that this kind of function is generated. Again, this is one piece of evidence among others that makes this point. That's the one that I'm going to show you just to indicate that, yeah, there's evidence that the difference between the sound signal in terms of its physical intensity and decibels and what we hear in terms of subjective loudness can be explained by the frequency of occurrence in the normal experience that we have all the time with natural sound signals. That discrepancy falls out.

Explore our Catalog

Join for free and get personalized recommendations, updates and offers.