The glottalized stops are preglottalized and voiced: [ʔɓ, ʔɗ] (i.e., the glottis is always closed before the oral closure). This glottal closure is often not released before the release of the oral closure, resulting in the characteristic implosive pronunciation. However, sometimes the glottal closure is released prior to the oral release in which case the stops are pronounced [ʔb, ʔd]. Therefore, the primary characteristic is preglottalization with implosion being secondary.

/v/ is generally pronounced [j] in informal speech, but the speakers generally pronounce [v] when they read a text. It is always pronounced [v] in loan words (va li, ti vi etc.), even in informal speech. There is [vj, bj, βj] that is also present among other speakers. These pronunciations are remnants of a merger and sound change involving /v/ in southern speech (but /v/ is always present in the northern and central regions).

Some speakers don't distinguish /s/ and /ʂ/.

Some speakers don't distinguish /c/ and /tʂ/.

Some speakers pronounce d as [j], and gi as [z], many speakers pronounce both as [j].

In southern speech, the phoneme /ʐ/ has a number of variant pronunciations that depend on the speaker. More than one pronunciation may even be found within a single speaker. It may occur as a retroflex fricative[ʐ], an alveolar approximant[ɹ], a flap[ɾ], a trill[r], or a fricative tap/trill [ɾ̞, r̝]. This sound is generally represented in Vietnamese linguistics by the letter ⟨r⟩.

The IPA chart of vowel nuclei to the right is based on the sounds in Hanoi Vietnamese, (i.e., other regions of Vietnam may have different inventories). Vowel nuclei consist of monophthongs (i.e., simple vowels) and three centering diphthongs.

All vowels are unrounded except for the three back rounded vowels: /u, o, ɔ/.

Thompson (1965) notes that in Hanoi the diphthongs, iê/iə̯/, ươ/ɨə̯/, uô/uə̯/, may be pronounced [ie̯, ɨə̯, uo̯], respectively (as the spelling suggests), but before /k, ŋ/ and in open syllables these are always pronounced [iə, ɨə, uə].

Thompson (1965) says that in Hanoi, words spelled with ưu and ươu are pronounced /iw, iəw/, respectively, whereas other dialects in the Tonkin delta pronounce them as /ɨw/ and /ɨəw/. Hanoi speakers that do pronounce these words with /ɨw/ and /ɨəw/ are using only a spelling pronunciation.

The monophthongal variants are now mainly heard in the Nghệ An, Hà Tĩnh ([ɔːŋ], [ɔːk], [oːŋ], [oːk]) and a few South Central Coast dialects ([eːŋ], [eːk]), and have been diphthongized in most Northern and Southern varieties (not to be confused with the palatalized recognitions of on [ɔːŋ], ot [ɔːk], ôn [oːŋ], ôt [oːk] in Southern varieties).

The pronunciation of syllable-final ch and nh in Hanoi Vietnamese has had different analyses. One analysis, that of Thompson (1965) has them as being phonemes /c, ɲ/, where /c/ contrasts with both syllable-final t/t/ and c/k/ and /ɲ/ contrasts with syllable-final n/n/ and ng/ŋ/. Final /c, ɲ/ is, then, identified with syllable-initial /c, ɲ/.

Another analysis has final ⟨ch⟩ and ⟨nh⟩ as representing predictable allophonic variants of the velar phonemes /k/ and /ŋ/ that occur after upper front vowels /i/ (orthographic ⟨i⟩) and /e/ (orthographic ⟨ê⟩). This analysis interprets orthographic ⟨ach⟩ and ⟨anh⟩ as an underlying /ɛ/, which becomes phonetically open and diphthongized: /ɛk/ → [ăi̯k̟], /ɛŋ/ → [ăi̯ŋ̟].[9]

Arguments for the second analysis include the limited distribution of final [c] and [ɲ], the gap in the distribution of [k] and [ŋ] which do not occur after [i] and [e], the pronunciation of ⟨ach⟩ and ⟨anh⟩ as [ɛc] and [ɛɲ] in certain conservative central dialects,[8] and the patterning of [k]~[c] and [ŋ]~[ɲ] in certain reduplicated words. Additionally, final [c] is not usually articulated as far forward as the initial [c]: [c] and [ɲ] are pre-velar [k̟, ŋ̟].

The first analysis closely follows the surface pronunciation of a slightly different Hanoi dialect than the second. In this dialect, the /a/ in /ac/ and /aɲ/ is not diphthongized but is actually articulated more forward, approaching a front vowel [æ]. This results in a three-way contrast between the rimes ăn[æ̈n] vs. anh[æ̈ɲ] vs. ăng[æ̈ŋ]. For this reason, a separate phonemic /ɲ/ is posited.

While the variety of Vietnamese spoken in Hanoi has preserved finals faithfully from old Vietnamese, the variety spoken in Saigon has drastically changed its finals. There were two steps in the development of rimes in the Saigonese dialect from old Vietnamese. In the first step, -ch and -nh merged with -t and -n, while rimes ending in /t, n/ (except after /i, e/) merged with /k, ŋ/. In the second step, vowels in alveolar rimes became centralized,[10] analogous to the diphthongization in the Hanoian dialect.

Unlike many Native American, African, and Chinese languages, Vietnamese tones do not rely solely on pitch contour. Vietnamese often uses instead a register complex (which is a combination of phonation type, pitch, length, vowel quality, etc.). So perhaps a better description would be that Vietnamese is a register language and not a "pure" tonal language.[11]

In Vietnamese orthography, tone is indicated by diacritics written above or below the vowel.

There is much variation among speakers concerning how tone is realized phonetically. There are differences between varieties of Vietnamese spoken in the major geographic areas (i.e. northern, central, southern) and smaller differences within the major areas (e.g. Hanoi vs. other northern varieties). In addition, there seems to be variation among individuals. More research is needed to determine the remaining details of tone realization and the variation among speakers.

The ngang tone is level at around the mid level (33) and is produced with modal voicephonation (i.e. with "normal" phonation). Alexandre de Rhodes (1651) describes this as "level"; Nguyễn (1997) describes it as "high (or mid) level".

Huyền tone:

The huyền tone starts low-mid and falls (21). Some Hanoi speakers start at a somewhat higher point (31). It is sometimes accompanied by breathy voice (or lax) phonation in some speakers, but this is lacking in other speakers: bà = [ɓa˨˩].[12] Alexandre de Rhodes (1651) describes this as "grave-lowering"; Nguyễn (1997) describes it as "low falling".

Hỏi tone:

The hỏi tone starts a mid level and falls. It starts with modal voice phonation, which moves increasingly toward tense voice with accompanying harsh voice (although the harsh voice seems to vary according to speaker). In Hanoi, the tone is mid falling (31). In other northern speakers, the tone is mid falling and then rises back to the mid level (313 or 323). This characteristic gives this tone its traditional description as "dipping". However, the falling-rising contour is most obvious in citation forms or when syllable-final; in other positions and when in fast speech, the rising contour is negligible. The hỏi also is relatively short compared with the other tones, but not as short as the nặng tone. Alexandre de Rhodes (1651) describes this as "smooth-rising"; Nguyễn (1997) describes it as "dipping-rising".

Ngã tone:

The ngã tone is mid rising (35). Many speakers begin the vowel with modal voice, followed by strong creaky voice starting toward the middle of the vowel, which is then lessening as the end of the syllable is approached. Some speakers with more dramatic glottalization have a glottal stop closure in the middle of the vowel (i.e. as [VʔV]). In Hanoi Vietnamese, the tone starts at a higher pitch (45) than other northern speakers. Alexandre de Rhodes (1651) describes this as "chesty-raised"; Nguyễn (1997) describes it as "creaking-rising".

Sắc tone:

The sắc tone starts as mid and then rises (35) in much the same way as the ngã tone. It is accompanied by tense voice phonation throughout the duration of the vowel. In some Hanoi speakers, the ngã tone is noticeably higher than the sắc tone, for example: sắc = ˧˦ (34); ngã = ˦ˀ˥ (45). Alexandre de Rhodes (1651) describes this as "acute-angry"; Nguyễn (1997) describes it as "high (or mid) rising".

Nặng tone:

The nặng tone starts mid or low-mid and rapidly falls in pitch (32 or 21). It starts with tense voice that becomes increasingly tense until the vowel ends in a glottal stop closure. This tone is noticeably shorter than the other tones. Alexandre de Rhodes (1651) describes this as "chesty-heavy"; Nguyễn (1997) describes it as "constricted".

This section needs expansion. You can help by adding to it.(January 2011)

The Southern variety is similar through all tones except for the nặng tone, which is pronounced [˨˧]. Many of those speaking Southern dialects will omit using the ngã tone and replace it with the hỏi tone.

An older analysis assumes eight tones rather than six.[13] This follows the lead of traditional Chinese phonology. In Middle Chinese, normal syllables allowed for three tonal distinctions, but syllables ending with /p/, /t/ or /k/ had no tonal distinctions. Rather, they were consistently pronounced with in a short high tone, which was called the entering tone and considered a fourth tone. Similar considerations lead to the identification of two additional tones in Vietnamese for syllables ending in /p/, /t/ and /k/. These are not phonemically distinct, however, and hence not considered as separate tones by modern linguists.

According to Hannas (1997), there are 4,500 to 4,800 possible spoken syllables (depending on dialect), and the standard national orthography (Quốc Ngữ) can represent 6,200 syllables (Quốc Ngữ orthography represents more phonemic distinctions than are made by any one dialect).[14] A description of syllable structure and exploration of its patterning according to the Prosodic Analysis approach of J.R. Firth is given in Henderson (1966).[15]

G: The offglide may be /j/ or /w/. Together, V and G must form one of the diphthongs or triphthongs listed in the section on Vowels. The offglide cannot be /w/ if the syllable contains a /w/ onglide, except for case of 'khuỷu (tay)' (elbow).

Syllables are spoken with an inherent tone contour. All tone contours are possible for open syllables (syllables without consonant codas) and closed syllables with nasal codas. If the syllable is closed with labial, coronal, or velar stops /p, t, k/, only 2 contours are possible, that is the sắc and the nặng tone.

The nặng tone mark (dot below) has been added to all rimes in this table for illustration purposes only. It indicates which letter tone marks in general are added to, largely according to the "new style" rules of Vietnamese orthography as stated in Quy tắc đặt dấu thanh trong chữ quốc ngữ. In practice, not all these rimes have real words or syllables that have the nặng tone.

The IPA representations are based on Wikipedia's conventions. Different dialects may have different pronunciations.

Below is a table comparing four linguists' different transcriptions of Vietnamese vowels as well as the orthographic representation. Notice that this article mostly follows Han (1966), with the exception of marking short vowels short.

Thompson (1965) says that the vowels [ʌ] (orthographic â) and [ɐ] (orthographic ă) are shorter than all of the other vowels, which is shown here with the length mark [ː] added to the other vowels. His vowels above are only the basic vowel phonemes. Thompson gives a very detailed description of each vowel's various allophonic realizations.

Han (1966) uses acoustic analysis, including spectrograms and formant measuring and plotting, to describe the vowels. She states that the primary difference between orthographic ơ & â and a & ă is a difference of length (a ratio of 2:1). ơ = /ɜː/, â = /ɜ/; a = /ɐː/, ă = /ɐ/. Her formant plots also seem to show that /ɜː/ may be slightly higher than /ɜ/ in some contexts (but this would be secondary to the main difference of length).

Another thing to mention about Han's studies is that she uses a rather small number of participants and, additionally, although her participants are native speakers of the Hanoi variety, they all have lived outside of Hanoi for a significant period of their lives (i.e. in France or Ho Chi Minh City).

Nguyễn (1997) has a simpler, more symmetrical description. He says that his work is not a "complete grammar" but rather a "descriptive introduction." So, his chart above is more a phonological vowel chart rather than a phonetic one.

^For example, Nguyễn & Edmondson (1998) show a male speaker from Nam Định with lax voice and a female speaker from Hanoi with breathy voice for the huyền tone while another male speaker from Hanoi has modal voice for the huyền.