Abstract : A large number of studies has established that the vision of typically visible articulators lips, jaw, face, tongue tip, teeth eases speech understanding by humans, and significantly increases the detection and identification performance of words in noise. However, everything cannot necessarily be -read- without ambiguity from the mere vision of face. -Cued Speech-, a language used by an increasing number of hearing impaired speakers, aims precisely to complement the lip information by means of a set of hand shapes and positions in relation to the face that provides most of the missing phonetic information, in particular related to tongue articulation. This coding system, although efficient in terms of information theory, is arbitrary and is not directly related to tongue movements. Therefore, we have attempted to determine if direct and full vision of the tongue - information presumably more intuitive - can be used. We have therefore exploited the virtual audiovisual talking head available at the laboratory that can display all speech articulators, including the tongue. The augmented reality condition chosen is a cutaway profile view. We have elaborated a set of audiovisual VCV stimuli by determining the talking head control parameters though inversion from the positions of the coils of an electromagnetic articulograph glued on the tongue, the jaw and the lips of the subject from which the talking head had been modelled. These stimuli have been played in a audiovisual perception test in four different conditions: audio signal alone AU, audio signal + cutaway view of the virtual head along the sagittal plane without tongue AVJ, audio signal + cutaway view with Tongue AVT, audio signal + complete Face with skin texture AVF. Each condition was played at four different Signal to Noise Ratios SNRs of white noise added to the sound: -oo i.e. no audio, 9 dB, +3 dB, +oo i.e. no noise. For each stimulus, the subject was forced to make a choice between eight consonants. In order to assess the learning effects, 12 subjects group I have transcribed the stimuli presented with decreasing SNRs for each condition, while 11 other subjects group II have identified the stimuli with increasing SNRs which opened the possibility of learning the relation between audio and video when the audio signal was clear at the beginning of the test for a given condition. Another set of VCV stimuli AVT condition, SNR = -9 dB has finally been used at the end of the test in order to assess the generalisation abilities of the subjects in both groups. A series of analyses lead to the following results. The identification scores of the group II are significantly higher than those of the group I, which supports the idea that group II has benefited from a stronger implicit learning. All the video presentation conditions give better scores than the audio alone. The scores for all SNRs rank, for each group, with statistically significant differences, in the decreasing order: AVF, AVT, AVJ, AU. For each SNR, AVF is significantly better decoded than AVJ: the subjects prefer an ecological rendering of the movements to a cutaway view. The AVT condition is not significantly better perceived than the AVJ condition except when the audio signal is absent, for the group II, who has benefited from a stronger implicit learning: in this case the AVT score is higher by 18% than the AVJ score. This result suggests that -tongue reading- can take over for the audio informa¬tion when this latter is not sufficient any longer to complement lip reading. Moreover, the fairly high identification score of a generalisation test proposed at the end of the session with different VCV stimuli, and the global difference of performance between the two groups seem to demonstrate that fast learning can be achieved. These very preliminary results need to be complemented by more systematic tests implying notably visual attention measurements, in order to confirm that natural human tongue reading abilities are weak, or that they are simply dominated by the lip reading ones. However, we envisage elaborating protocols to show that tongue reading learning is fast and easy. Our future goal is thus to use the augmented speech abilities of our virtual talking head for applications in the domain of speech therapy for speech retarded children, perception and production rehabilitation of hearing impaired children, and pronunciation training for second language learners.