In linguistics, prosody describes all the acoustic properties of speech that cannot be predicted from a local window on the orthographic (or similar) transcription. So, prosody is relative to a default pronunciation of a phoneme/feature bundle/segment/syllable; it does not include coarticulation because coarticulation is predictable from the immediate phonological or orthographic neighborhood. Qualitatively, one can understand prosody as the difference between a well-performed play, and one on first reading.

The term generally covers intonation, isochrony(rhythm), and ((stress (linguistics)]] in speech. Acoustically, prosody describes changes in the syllable length, loudness,
pitch, and certain details of the formant structure of speech sounds. Looking at the speech articulators, it describes changes in the velocity and range of motion in articulators like the jaw and tongue, along with quantities like the air pressure in the trachea and the tensions in the laryngeal muscles. Phonologically, prosody is described by tone, intonation, rhythm, and lexical stress.

A precise definition of prosody and its effects depends upon the language. For instance, some languages make lexical distinctions based on vowel duration. In such languages, syllable length would thus be at least partly predictable from a transcription and thus not completely prosodic. Likewise, in tone languages such as Mandarin, the pitch and/or intonation is at least partially predictable from the lexical tone of a word, and thus not completely prosodic.

Similarly, the formant structure of vowels is primarily determined by a phonological or orthographic transcription, but not entirely. Vowels are generally more completely realized in accented or focussed syllables. From an acoustic point of view, it means that the formant structure is farther from the structure of a neutral vowel (typically the schwa), and closer to the vowels that one might see in the stressed syllables of a carefully spoken word. Thus, the precise formant structure of vowels is normally contains a mixture of prosodic and lexical information.

The prosodic features of a unit of speech, whether a syllable, word, phrase, or clause, are typically called suprasegmental features because they typically affect all the segments of the unit.

Prosodic units do not always correspond to grammatical units, although both may reflect how the brain processes speech.
Phrases and clauses are grammatical concepts, but they may have prosodic equivalents, commonly called prosodic units,intonation units, or declination units, which are the actual phonetic spurts or chunks of speech. These are often believed to exist as a hierarchy of levels. Such units are characterized by several phonetic cues, such as a coherent pitch contour, and the gradual decline in pitch and lengthening of vowels over the duration of the unit, until the pitch and speed are reset to begin the next unit. Breathing, both inhalation and exhalation, only seems to occur at these boundaries.

Different schools of linguistics describe somewhat different prosodic units. One common distinction is between continuing prosody, which in English orthography we might mark with a comma, and final prosody, which we might mark with a full stop (period). This is the common usage of the IPA symbols for "minor" and "major" prosodic breaks (American English pronunciation):

Jack, preparing the way, went on.

[ˈdʒæk | pɹəˌpɛəɹɪŋ ðə ˈweɪ | wɛnt ˈɒn ‖ ]

Jacques, préparant le sol, tomba.

[ˈʒak | pʁepaʁɑ̃ lɵ ˈsɔl | tɔ̃ˈba ‖ ]

Note that the last syllable with a full vowel in a French prosodic unit is stressed, and that the last stressed syllable in an English prosodic unit has primary stress. This shows that stress is not phonemic in French, and that the difference between primary and secondary stress is not phonemic in English; they are both elements of prosody rather than inherent in the words.

The pipe symbols – the vertical bars | and ‖ – used above are phonetic, and so will often disagree with English punctuation, which only partially correlates with prosody.

However, the pipes may also be used for metrical breaks -- a single pipe being used to mark metrical feet, and a double pipe to mark both continuing and final prosody, as their alternate names "foot group" and "intonation group" suggest. In such usage, each foot group would include one and only one heavy syllable. In English, this would mean one and only one stressed syllable:

Jack, preparing the way, went on.

[ˈdʒæk ‖ pɹəˌpɛəɹɪŋ | ðə ˈweɪ ‖ wɛnt ˈɒn ‖ ]

In many tone languages with downdrift, such as Hausa, [ | ] is often used to represent a minor prosodic break that does not interrupt the overall decline in pitch of the utterance, while [ ‖ ] marks either continuing or final prosody that creates a pitch reset. In such cases, some linguists use only the single pipe, with continuing and final prosody marked by a comma and period, respectively.

In transcriptions of non-tonal languages, the three symbols pipe, comma, and period may also be used, with the pipe representing a break more minor than the comma, the so-called list prosody often used to separate items when reading lists, spelling words, or giving out telephone numbers.

It can be assumed that many people can communicate and interpret extensibly using slight colours, tonation and rhythm in the voice to extend emotions and clever nuances in conversation. However, it should be noted that not everyone is assumed able to fully understand or even acknowledge such extensive tonal characteristics in particular speech - even in their native language. SeeSociolinguistics

Contents

Languages can be classified according to the distinctive prosodic unit that gives a language its rhythm. Languages can be stress-timed, syllable-timed, or mora-timed. Stress-timed languages include English and Dutch, syllable-timed languages include Spanish and Italian, and an example of a mora-timed language is Japanese. The classification of languages is done under the assumption that a language has “isochronous rhythm,"[1] meaning that there is an equal amount of time between stressed syllables, syllables, or moras, depending on the category of language.

The details of a language's prosody depend upon its phonology. For instance, in a language with phonemicvowel length, this must be marked separately from prosodic syllable length. In similar manner, prosodic pitch must not obscure tone in a tone language if the result is to be intelligible. Although tone languages such as Mandarin have prosodic pitch variations in the course of a sentence, such variations are long and smooth contours, on which the short and sharp lexical tones are superimposed. If pitch can be compared to ocean waves, the swells are the prosody, and the wind-blown ripples in their surface are the lexical tones, as with stress in English. The word dessert has greater stress on the second syllable, compared to the noun desert, which has greater stress on the first (in its "arid land" meaning, but not in its "thing which is deserved" meaning); but this distinction is not obscured when the entire word is stressed by a child demanding "Give me dessert!" Vowels in many languages are likewise pronounced differently (typically less centrally) in a careful rhythm or when a word is emphasized, but not so much as to overlap with the formant structure of a different vowel. Both lexical and prosodic information are encoded in rhythm, loudness, pitch, and vowel formants.

Prosodic features are suprasegmental. They are not confined to any one segment, but occur in some higher level of an utterance. These prosodic units are the actual phonetic "spurts", or chunks of speech. They need not correspond to grammatical units such as phrases and clauses, though they may; and these facts suggest insights into how the brain processes speech.

Prosodic units are marked by phonetic cues. Phonetic cues can include aspects of prosody such as pitch, pauses, and accents, all of which are cues that must be analyzed in context, or in comparison to other aspects of a sentence. Pitch, for example, can change over the course of a sentence. In English, falling intonation indicates a declarative statement while rising intonation indicates an interrogative statement. Pauses are important prosodic units because they can often indicate breaks in a thought and can also sometimes indicate the intended grouping of nouns in a list. Breathing, both inhalation and exhalation, seems to occur only at these pauses where the prosody resets. Prosodic units, along with function words and punctuation, help to mark clause boundaries in speech. Accents, meanwhile, help to distinguish certain aspects of a sentence that may require more attention. English often utilizes a pitch accent, or an emphasis on the final word of a sentence. Focus accents serve to emphasize a word in a sentence that requires more attention, such as if that word specifically is intended to be a response to a question.[1]

"Prosodic structure" is important in language contact and lexical borrowing. For example, in Modern Hebrew, the XiXéX verb-template is much more productive than the XaXáX verb-template because in morphemic adaptations of non-Hebrew stems, the XiXéX verb-template is more likely to retain – in all conjugations throughout the tenses – the prosodic structure (e.g., the consonant clusters and the location of the vowels) of the stem.[2]

Unique prosodic features have been noted in infant-directed speech (IDS) - also known as baby talk, child-directed speech (CDS), or motherese. Adults, especially caregivers, speaking to young children tend to imitate childlike speech by using higher and more variable pitch, as well as an exaggerated stress. These prosodic characteristics are thought to assist children in acquiring phonemes, segmenting words, and recognizing phrasal boundaries. And though there is no evidence to indicate that infant-directed speech is necessary for language acquisition, these specific prosodic features have been observed in many different languages.[3]

Prosody is useful for listeners as they perform sentence parsing. Prosody helps resolve sentence ambiguity. For example, the sentence: “They invited Bob and Bill and Al got rejected,” is ambiguous when written.[1] But when the sentence is read aloud, prosodic cues like pauses and changes in intonation will make the meaning clear. The prosody of an ambiguous sentence biases a listener’s interpretation of that sentence.[1] Moving the intonational boundary in the above example will change the interpretation of the sentence. This result has been found in studies performed in both English and Bulgarian.[4]

Prosody is also useful in expressing (for speakers) and detecting (for listeners) sarcasm. The most useful prosodic feature in detecting sarcasm is a reduction in the mean fundamental frequency relative to other speech for humor, neutrality, or sincerity. While prosodic cues are important in indicating sarcasm, context clues and shared knowledge are also important.[5]

Emotional prosody is the expression of feelings using prosodic elements of speech. It was considered by Charles Darwin in The Descent of Man to predate the evolution of human language: "Even monkeys express strong feelings in different tones – anger and impatience by low, – fear and pain by high notes."[6]Native speakers listening to actors reading emotionally neutral text while projecting emotions correctly recognized happiness 62% of the time, anger 95%, surprise 91%, sadness 81%, and neutral tone 76%. When a database of this speech was processed by computer, segmental features allowed better than 90% recognition of happiness and anger, while suprasegmental prosodic features allowed only 44%–49% recognition. The reverse was true for surprise, which was recognized only 69% of the time by segmental features and 96% of the time by suprasegmental prosody.[7] In typical conversation (no actor voice involved), the recognition of emotion may be quite low, of the order of 50%, hampering the complex interrelationship function of speech advocated by some authors.[8] That said, even if emotional expression through prosody cannot always be consciously recognized, tone of voice may continue to have subconscious effects in conversation. This sort of expression stems from not linguistic or semantic effects, and can thus be isolated from traditional linguistic content. Aptitude of the average person to decode conversational implicature of emotional prosody has been found to be slightly less accurate than traditional facial expression discrimination ability;however, specific ability to decode varies by emotion. These emotional have been determined to be ubiquitous across cultures, as they are utilized and understood across cultures. Various emotions, and their general experimental identification rates, are as follows:[9]

Anger and Sadness: High rate of accurate identification

Fear and Happiness: Medium rate of accurate identification

Disgust: Poor rate of accurate identification

The prosody of an utterance is used by listeners to guide decisions about the emotional affect of the situation. Whether a person decodes the prosody as positive, negative, or neutral plays a factor in the way a person decodes a facial expression accompanying an utterance. As the facial expression becomes closer to neutral, the prosodic interpretation influences the interpretation of the facial expression. A study by Marc D. Pell revealed that 600 ms of prosodic information is necessary for listeners to be able to identify the affective tone of the utterance. At lengths below this, there was not enough information for listeners to process the emotional context of the utterance.[10]

An aprosodia is an acquired or developmental impairment in comprehending or generating the emotion conveyed in spoken language. Aprosody is often accompanied by the inability to properly utilize variations in speech, particularly with deficits in ability to accurately modulate pitch, loudness, intonation, and rhythm of word formation.[11] This is seen sometimes in persons with Asperger syndrome.[12]

Producing these nonverbal elements requires intact motor areas of the face, mouth, tongue, and throat. This area is associated with Brodmann areas 44 and 45 (Broca's area) of the left frontal lobe. Damage to areas 44/45 produces motor aprosodia, with the nonverbal elements of speech being disturbed (facial expression, tone, rhythm of voice).

Understanding these nonverbal elements requires an intact and properly functioning right-hemisphere perisylvian area, particularly Brodmann area 22 (not to be confused with the corresponding area in the left hemisphere, which contains Wernicke's area).[13] Damage to the right inferior frontal gyrus causes a diminished ability to convey emotion or emphasis by voice or gesture, and damage to right superior temporal gyrus causes problems comprehending emotion or emphasis in the voice or gestures of others. The right Brodmann area 22 aids in the interpretation of prosody, and damage causes sensory aprosodia, with the patient unable to comprehend changes in voice and body language.