Diphthongs contrast with monophthongs, where the tongue or other speech organs do not move and the syllable contains only a single vowel sound. For instance, in English, the word ah is spoken as a monophthong (/ɑː/), while the word ow is spoken as a diphthong in most dialects (/aʊ/). Where two adjacent vowel sounds occur in different syllables—for example, in the English word re-elect—the result is described as hiatus, not as a diphthong.

Diphthongs often form when separate vowels are run together in rapid speech during a conversation. However, there are also unitary diphthongs, as in the English examples above, which are heard by listeners as single-vowel sounds (phonemes).[2]

Diphthongs use two vowel sounds in one syllable to make a speech sound.[3]

In the International Phonetic Alphabet (IPA), monophthongs are transcribed with one symbol, as in English sun[sʌn], in which ⟨ʌ⟩ represents a monophthong. Diphthongs are transcribed with two symbols, as in English high[haɪ] or cow[kaʊ], in which ⟨aɪ⟩ and ⟨aʊ⟩ represent diphthongs.

Diphthongs may be transcribed with two vowel symbols or with a vowel symbol and a semivowel symbol. In the words above, the less prominent member of the diphthong can be represented with the symbols for the palatal approximant [j] and the labiovelar approximant [w], with the symbols for the close vowels [i] and [u], or the symbols for the near-close vowels[ɪ] and [ʊ]:

vowel and semivowel

⟨haj kaw⟩

broader transcription

two vowel symbols

⟨hai̯ kau̯⟩

⟨haɪ̯ kaʊ̯⟩

narrower transcription

Some transcriptions are broader or narrower (less precise or more precise phonetically) than others. Transcribing the English diphthongs in high and cow as ⟨aj aw⟩ or ⟨ai̯ au̯⟩ is a less precise or broader transcription, since these diphthongs usually end in a vowel sound that is opener than the semivowels [j w] or the close vowels[i u]. Transcribing the diphthongs as ⟨aɪ̯ aʊ̯⟩ is a more precise or narrower transcription, since the English diphthongs usually end in the near-close vowels[ɪ ʊ].

The non-syllabic diacritic, the inverted breve below ⟨◌̯⟩,[4] is placed under the less prominent part of a diphthong to show that it is part of a diphthong rather than a vowel in a separate syllable: [aɪ̯ aʊ̯]. When there is no contrastive vowel sequence in the language, the diacritic may be omitted. Other common indications that the two letters are not separate vowels are a superscript, ⟨aᶦ aᶷ⟩,[5] or a tie bar, ⟨a͡ɪ a͡ʊ⟩ or ⟨a͜ɪ a͜ʊ⟩.[6] The tie bar can be useful when it is not clear which letter represents the syllable nucleus, or when they have equal weight.[7] Superscripts are especially used when an on- or off-glide is particularly fleeting.[8]

The period ⟨.⟩ is the opposite of the non-syllabic diacritic: it represents a syllable break. If two vowels next to each other belong to two different syllables (hiatus), meaning that they do not form a diphthong, they can be transcribed with two vowel symbols with a period in between. Thus, lower can be transcribed ⟨ˈloʊ.ər⟩, with a period separating the first syllable, /loʊ/, from the second syllable, /ər/.

The non-syllabic diacritic is only used when necessary. It is typically omitted when there is no ambiguity, as in ⟨haɪ kaʊ⟩. No words in English have the vowel sequences *[a.ɪ a.ʊ], so the non-syllabic diacritic is unnecessary.

Falling (or descending) diphthongs start with a vowel quality of higher prominence (higher pitch or volume) and end in a semivowel with less prominence, like [aɪ̯] in eye, while rising (or ascending) diphthongs begin with a less prominent semivowel and end with a more prominent full vowel, similar to the [ja] in yard. (Note that "falling" and "rising" in this context do not refer to vowel height; for that, the terms "opening" and "closing" are used instead. See below.) The less prominent component in the diphthong may also be transcribed as an approximant, thus [aj] in eye and [ja] in yard. However, when the diphthong is analysed as a single phoneme, both elements are often transcribed with vowel letters (/aɪ̯/, /ɪ̯a/). Note also that semivowels and approximants are not equivalent in all treatments, and in the English and Italian languages, among others, many phoneticians do not consider rising combinations to be diphthongs, but rather sequences of approximant and vowel. There are many languages (such as Romanian) that contrast one or more rising diphthongs with similar sequences of a glide and a vowel in their phonetic inventory[9] (see semivowel for examples).

In closing diphthongs, the second element is more close than the first (e.g. [ai]); in opening diphthongs, the second element is more open (e.g. [ia]). Closing diphthongs tend to be falling ([ai̯]), and opening diphthongs are generally rising ([i̯a]),[10] as open vowels are more sonorous and therefore tend to be more prominent. However, exceptions to this rule are not rare in the world's languages. In Finnish, for instance, the opening diphthongs /ie̯/ and /uo̯/ are true falling diphthongs, since they begin louder and with higher pitch and fall in prominence during the diphthong.

A third, rare type of diphthong that is neither opening nor closing is height-harmonic diphthongs, with both elements at the same vowel height.[11] These occurred in Old English:

beon[beo̯n] "be"

ceald[kæɑ̯ld] "cold"

A centering diphthong is one that begins with a more peripheral vowel and ends with a more central one, such as [ɪə̯], [ɛə̯], and [ʊə̯] in Received Pronunciation or [iə̯] and [uə̯] in Irish. Many centering diphthongs are also opening diphthongs ([iə̯], [uə̯]).

Diphthongs may contrast in how far they open or close. For example, Samoan contrasts low-to-mid with low-to-high diphthongs:

Narrow diphthongs are the ones that end with a vowel which on a vowel chart is quite close to the one that begins the diphthong, for example Northern Dutch [eɪ], [øʏ] and [oʊ]. Wide diphthongs are the opposite - they require a greater tongue movement, and their offsets are farther away from their starting points on the vowel chart. Examples of wide diphthongs are RP/GA English [aɪ] and [aʊ].

Languages differ in the length of diphthongs, measured in terms of morae. In languages with phonemically short and long vowels, diphthongs typically behave like long vowels, and are pronounced with a similar length.[12][citation needed] In languages with only one phonemic length for pure vowels, however, diphthongs may behave like pure vowels.[citation needed] For example, in Icelandic, both monophthongs and diphthongs are pronounced long before single consonants and short before most consonant clusters.

Some languages contrast short and long diphthongs. In some languages, such as Old English, these behave like short and long vowels, occupying one and two morae, respectively. Languages that contrast three quantities in diphthongs are extremely rare, but not unheard of; Northern Sami is known to contrast long, short and "finally stressed" diphthongs, the last of which are distinguished by a long second element.[citation needed]

While there are a number of similarities, diphthongs are not the same phonologically as a combination of a vowel and an approximant or glide. Most importantly, diphthongs are fully contained in the syllable nucleus[13][14] while a semivowel or glide is restricted to the syllable boundaries (either the onset or the coda). This often manifests itself phonetically by a greater degree of constriction,[15] though the phonetic distinction is not always clear.[16] The English word yes, for example, consists of a palatal glide followed by a monophthong rather than a rising diphthong. In addition, the segmental elements must be different in diphthongs so that [ii̯], when it occurs in a language, does not contrast with [iː] though it is possible for languages to contrast [ij] and [iː].[17]

In words coming from Middle English, most cases of the Modern English diphthongs [aɪ̯, oʊ̯, eɪ̯, aʊ̯] originate from the Middle English long monophthongs [iː, ɔː, aː, uː] through the Great Vowel Shift, although some cases of [oʊ̯, eɪ̯] originate from the Middle English diphthongs[ɔu̯, aɪ̯].

^In Pittsburgh English, /aʊ̯/ is monophthongal [aː], leading to the stereotypical spelling "Dahntahn" for "downtown".

^ abCanadian English and some dialects of northern American English exhibit allophony of /aʊ̯/ and /aɪ̯/ called Canadian raising – in some places they have become separate phonemes. GA and RP have raising to a lesser extent in /aɪ̯/.

^In several American dialects such as Southern American English, /aɪ̯/ becomes monophthongal [aː] except before voiceless consonants.

^ abcdeThe erstwhile monophthongs /iː/ and /uː/ are diphthongized in many dialects. In many cases they might be better transcribed as [uu̯] and [ii̯], where the non-syllabic element is understood to be closer than the syllabic element. They are sometimes transcribed /uw/ and /ij/.

^ abcIn rhotic dialects, words like pair, poor, and peer can be analyzed as diphthongs, although other descriptions analyze them as vowels with [ɹ] in the coda.

^ abc[eɪ̯], [øʏ̯], and [oʊ̯] are normally pronounced as closing diphthongs except when preceding [ɾ], in which case they are either centering diphthongs: [eə̯], [øə̯], and [oə̯] or are lengthened and monophthongized to [ɪː], [øː], and [ʊː]

The dialect of Hamont (in Limburg) has five centring diphthongs and contrasts long and short forms of [ɛɪ̯], [œʏ̯], [ɔʊ̯], and [ɑʊ̯].[20]

In the varieties of German that vocalize the /r/ in the syllable coda, other diphthongal combinations may occur. These are only phonetic diphthongs, not phonemic diphthongs, since the vocalic pronunciation [ɐ̯] alternates with consonantal pronunciations of /r/ if a vowel follows, cf. du hörst[duː ˈhøːɐ̯st] ‘you hear’ – ich höre[ʔɪç ˈhøːʀə] ‘I hear’. These phonetic diphthongs may be as follows:

^1Wiese (1996) notes that the length contrast is not very stable before non-prevocalic /r/[21] and that "Meinhold & Stock (1980:180), following the pronouncing dictionaries (Mangold (1990), Krech & Stötzer (1982)) judge the vowel in Art, Schwert, Fahrt to be long, while the vowel in Ort, Furcht, hart is supposed to be short. The factual basis of this presumed distinction seems very questionable."[21][22] He goes on stating that in his own dialect, there is no length difference in these words, and that judgements on vowel length in front of non-prevocalic /r/ which is itself vocalized are problematic, in particular if /a/ precedes.[21]

According to the 'lengthless' analysis, the aforementioned 'long' diphthongs are analyzed as [iɐ̯], [yɐ̯], [uɐ̯], [eɐ̯], [øɐ̯], [oɐ̯], [ɛɐ̯] and [aɐ̯]. This makes non-prevocalic /aːr/ and /ar/ homophonous as [aɐ̯] or [aː]. Non-prevocalic /ɛːr/ and /ɛr/ may also merge, but the vowel chart in Kohler (1999:88) shows that they have somewhat different starting points.

Wiese (1996) also states that "laxing of the vowel is predicted to take place in shortened vowels; it does indeed seem to go hand in hand with the vowel shortening in many cases."[21]

In French, /wa/, /wɛ̃/, /ɥi/ and /ɥɛ̃/ may be considered true diphthongs (that is, fully contained in the syllable nucleus: [u̯a], [u̯ɛ̃], [y̯i], [y̯ɛ̃]). Other sequences are considered part of a glide formation process that turns a high vowel into a semivowel (and part of the syllable onset) when followed by another vowel.[24]

In the sequences [ɡw] or [kw] and vowel, e.g. guant, quota, qüestió, pingüí (these exceptional cases even lead some scholars[27] to hypothesize the existence of rare labiovelar phonemes /ɡʷ/ and /kʷ/).[28]

There are also certain instances of compensatory diphthongization in the Majorcan dialect so that /ˈtroncs/ ('logs') (in addition to deleting the palatal plosive) develops a compensating palatal glide and surfaces as [ˈtrojns] (and contrasts with the unpluralized [ˈtronʲc]). Diphthongization compensates for the loss of the palatal stop (part of Catalan's segment loss compensation). There are other cases where diphthongization compensates for the loss of point of articulation features (property loss compensation) as in [ˈaɲ] ('year') vs [ˈajns] ('years').[29] The dialectal distribution of this compensatory diphthongization is almost entirely dependent on the dorsal plosive (whether it is velar or palatal) and the extent of consonant assimilation (whether or not it is extended to palatals).[30]

The Portuguese diphthongs are formed by the labio-velar approximant[w] and palatal approximant[j] with a vowel,[31]European Portuguese has 14 phonemic diphthongs (10 oral and 4 nasal),[32] all of which are falling diphthongs formed by a vowel and a nonsyllabic high vowel. Brazilian Portuguese has roughly the same amount, although the European and non-European dialects have slightly different pronunciations ([ɐj] is a distinctive feature of some southern and central Portuguese dialects, especially that of Lisbon). A [w] onglide after /k/ or /ɡ/ and before all vowels as in quando[ˈkwɐ̃du] ('when') or guarda[ˈɡwaɾðɐ ~ ˈɡwaʁdɐ] ('guard') may also form rising diphthongs and triphthongs. Additionally, in casual speech, adjacent heterosyllabic vowels may combine into diphthongs and triphthongs or even sequences of them.[33]

In addition, phonetic diphthongs are formed in most Brazilian Portuguese dialects by the vocalization of /l/ in the syllable coda with words like sol[sɔw] ('sun') and sul[suw] ('south') as well as by yodization of vowels preceding /s/ or its allophone at syllable coda [ʃ ~ ɕ] in terms like arroz[aˈʁojs ~ ɐˈʁo(j)ɕ] ('rice'),[33] and /z/ (or [ʒ ~ ʑ]) in terms such as paz mundial[ˈpajz mũdʒiˈaw ~ ˈpa(j)ʑ mũdʑiˈaw] ('world peace') and dez anos[ˌdɛjˈz‿ɐ̃nu(j)s ~ ˌdɛjˈz‿ɐ̃nuɕ] ('ten years').

Phonetically, Spanish has seven falling diphthongs and eight rising diphthongs. In addition, during fast speech, sequences of vowels in hiatus become diphthongs wherein one becomes non-syllabic (unless they are the same vowel, in which case they fuse together) as in poeta[ˈpo̯eta] ('poet') and maestro[ˈmae̯stɾo] ('teacher'). The Spanish diphthongs are:[34][35]

The second table includes only 'false' diphthongs, being they composed by a semivowel + a vowel, not two vowels. The situation is more nuanced in the first table: a word such as 'baita' is actually pronounced ['baj.ta] and most speakers would syllabify it that way. A word such as 'voi' would instead be pronounced and syllabified as ['vo.i], yet again without a diphthong.

In general, unstressed /i e o u/ in hiatus can turn into glides in more rapid speech (e.g. biennale[bi̯enˈnaːle] 'biennial'; coalizione[ko̯alitˈtsi̯oːne] 'coalition') with the process occurring more readily in syllables further from stress.[37]

Romanian has two true diphthongs: /e̯a/ and /o̯a/. There are however a host of other vowel combinations ( more than any other romance language) which are classed as vowel glides. As a result of their origin (diphthongization of mid vowels under stress), the two true diphthongs appear only in stressed syllables[38] and make morphological alternations with the mid vowels /e/ and /o/. To native speakers, they sound very similar to /ja/ and /wa/ respectively.[39] There are no perfect minimal pairs to contrast /o̯a/ and /wa/,[9] and because /o̯a/ doesn't appear in the final syllable of a prosodic word, there are no monosyllabic words with /o̯a/; exceptions might include voal ('veil') and trotuar ('sidewalk'), though Ioana Chițoran argues[40] that these are best treated as containing glide-vowel sequences rather than diphthongs. In addition to these, the semivowels /j/ and /w/ can be combined (either before, after, or both) with most vowels, while this arguably[41] forms additional diphthongs and triphthongs, only /e̯a/ and /o̯a/ can follow an obstruent-liquid cluster such as in broască ('frog') and dreagă ('to mend').[42] implying that /j/ and /w/ are restricted to the syllable boundary and therefore, strictly speaking, do not form diphthongs.

Welsh is traditionally divided into Northern and Southern dialects. In the north, some diphthongs may be short or long according to regular vowel length rules but in the south they are always short (see Welsh phonology). Southern dialects tend to simplify diphthongs in speech (e.g. gwaith/ɡwaiθ/ is reduced to /ɡwaːθ/).

Grapheme

North

South

Example

ae

/ɑːɨ/

/ai/

maen 'stone'

ai

/ai/

gwaith 'work'

au†

/aɨ/

haul 'sun'

aw

/au, ɑːu/

/au/

mawr 'big'

ei

/əi/

/əi/

gweithio 'to work'

eu

/əɨ/

treulio 'spend'

ey

teyrn 'tyrant'

ew

/ɛu, eːu/

/ɛu/

tew 'fat'

oe

/ɔɨ, ɔːɨ/

/ɔi/

moel 'bald'

ou

cyffrous 'excited'

oi

/ɔi/

troi 'turn'

ow

/ɔu, oːu/

/ɔu/

brown 'brown'

wy

/ʊɨ, uːɨ/

/ʊi/

pwyll 'sense'

iw

/ɪu/

/ɪu/

lliw 'colour'

uw

/ɨu/

duw 'god'

yw

llyw 'rudder'

/əu/

/əu/

tywydd 'weather'

† The plural ending -au is reduced to /a/ in the north and /e/ in the south, e.g. cadau 'battles' is /ˈkada/ (north) or /ˈkade/ (south).

All Finnishdiphthongs are falling. Notably, Finnish has true opening diphthongs (e.g. /uo/), which are not very common crosslinguistically compared to centering diphthongs (e.g. /uə/ in English). Vowel combinations across syllables may in practice be pronounced as diphthongs, when an intervening consonant has elided, as in näön[næøn] instead of [næ.øn] for the genitive of näkö ('sight').

The diphthong system in Northern Sami varies considerably from one dialect to another. The Western Finnmark dialects distinguish four different qualities of opening diphthongs:

/eæ/ as in leat "to be"

/ie/ as in giella "language"

/oa/ as in boahtit "to come"

/uo/ as in vuodjat "to swim"

In terms of quantity, Northern Sami shows a three-way contrast between long, short and finally stressed diphthongs. The last are distinguished from long and short diphthongs by a markedly long and stressed second component. Diphthong quantity is not indicated in spelling.

Kohler, Klaus J. (1999), "German", Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet, Cambridge: Cambridge University Press, pp. 86–89, doi:10.1017/S0025100300004874, ISBN0-521-65236-7