We have developed two different methods for using auditory, telephone speech to drive the movements of a synthetic face. In the first method, Hidden Markov Models (HMMs) were trained on a phonetically transcribed telephone speech database. The output of the HMMs was then fed into a rulebased visual speech synthesizer as a string of phonemes together with time labels. In the second method, Artificial Neural Networks (ANNs) were trained on the same database to map acoustic parameters directly to facial control parameters. These target parameter trajectories were generated by using phoneme strings from a database as input to the visual speech synthesis The two methods were evaluated through audiovisual intelligibility tests with ten hearing impaired persons, and compared to “ideal” articulations (where no recognition was involved), a natural face, and to the intelligibility of the audio alone. It was found that the HMM method performs considerably better than the audio alone condition (54% and 34% keywords correct respectively), but not as well as the “ideal” articulating artificial face (64%). The intelligibility for the ANN method was 34% keywords correct.

this paper, we discuss an adaptive method of handling fragmented user utterances to a speech-based multimodal dialogue system. Inserted silent pauses between fragments present the following problem: Does the current silence indicate that the user has completed her utterance, or is the silence just a pause between two fragments, so that the system should wait for more input? Our system incrementally classifies user utterances as either closing (more input is unlikely to come) or non-closing (more input is likely to come), partly depending on the current dialogue state. Utterances that are categorized as non-closing allow the dialogue system to await additional spoken or graphical input before responding

When designing multimodal dialogue systems allowing speech as well as graphical operations, it is important to understand not only how people make use of the different modalities in their utterances, but also how the system might influence a user’s choice of modality by its own behavior. This paper describes an experiment in which subjects interacted with two versions of a simulated multimodal dialogue system. One version used predominantly graphical means when referring to specific objects; the other used predominantly verbal referential expressions. The purpose of the study was to find out what effect, if any, the system’s referential strategy had on the user’s behavior. The results provided limited support for the hypothesis that the system can influence users to adopt another modality for the purpose of referring

In this paper, we compare the distribution of disfluencies in two human--computer dialogue corpora. One corpus consists of unimodal travel booking dialogues, which were recorded over the telephone. In this unimodal system, all components except the speech recognition were authentic. The other corpus was collected using a semi-simulated multi-modal dialogue system with an animated talking agent and a clickable map. The aim of this paper is to analyze and discuss the effects of modality, task and interface design on the distribution and frequency of disfluencies in these two corpora.

This paper examines feedback strategies in a Swedish corpus of multimodal human--computer interaction. The aim of the study is to investigate how users provide positive and negative feedback to a dialogue system and to discuss the function of these utterances in the dialogues. User feedback in the AdApt corpus was labeled and analyzed, and its distribution in the dialogues is discussed. The question of whether it is possible to utilize user feedback in future systems is considered. More specifically, we discuss how error handling in human--computer dialogue might be improved through greater knowledge of user feedback strategies. In the present corpus, almost all subjects used positive or negative feedback at least once during their interaction with the system. Our results indicate that some types of feedback more often occur in certain positions in the dialogue. Another observation is that there appear to be great individual variations in feedback strategies, so that certain subjects give feedback at almost every turn while others rarely or never respond to a spoken dialogue system in this manner. Finally, we discuss how feedback could be used to prevent problems in human--computer dialogue.

This paper is an investigation of repetitive utterances in a Swedish database of spontaneous computer-directed speech. A spoken dialogue system was installed in a public location in downtown Stockholm and spontaneous human-computerinteractions with adults and children were recorded [1]. Several acoustic and prosodic features such as duration, shifting of focusand hyperarticulation were examined to see whether repetitions could be distinguished from what the users first said to the system. The present study indicates that adults and children use partly different strategies as they attempt to resolve errors by means of repetition. As repetition occurs, duration is increased and words are often hyperarticulated or contrastively focused. These results could have implications for the development of future spoken dialogue systems with robust error handling.

It is envisioned that autonomous software agents that cancommunicate using speech and gesture will soon be oneverybody’s computer screen. This paper describes anarchitecture that can be used to design and animate characterscapable of lip-synchronised synthetic speech as well as bodygestures, for use in for example spoken dialogue systems. Ageneral scheme for computationally efficient parametricdeformation of facial surfaces is presented, as well as techniques for generation of bimodal speech, facial expressionsand body gestures in a spoken dialogue system. Resultsindicating that an animated cartoon-like character can be asignificant contribution to speech intelligibility, are also reported.

A system for rule based audiovisual text-to-speech synthesishas been created. The system is based on the KTHtext-to-speech system which has been complementedwith a three-dimensional parameterized model of a humanface. The face can be animated in real time, synchronizedwith the auditory speech. The facial model iscontrolled by the same synthesis software as the auditoryspeech synthesizer. A set of rules that takes coarticulationinto account has been developed. The audiovisualtext-to-speech system has also been incorporated into aspoken man-machine dialogue system that is being developedat the department.

This thesis presents work in the area of computer-animatedtalking heads. A system for multimodal speech synthesis hasbeen developed, capable of generating audiovisual speechanimations from arbitrary text, using parametrically controlled3D models of the face and head. A speech-specific directparameterisation of the movement of the visible articulators(lips, tongue and jaw) is suggested, along with a flexiblescheme for parameterising facial surface deformations based onwell-defined articulatory targets.

To improve the realism and validity of facial and intra-oralspeech movements, measurements from real speakers have beenincorporated from several types of static and dynamic datasources. These include ultrasound measurements of tonguesurface shape, dynamic optical motion tracking of face pointsin 3D, as well as electromagnetic articulography (EMA)providing dynamic tongue movement data in 2D. Ultrasound dataare used to estimate target configurations for a complex tonguemodel for a number of sustained articulations. Simultaneousoptical and electromagnetic measurements are performed and thedata are used to resynthesise facial and intra-oralarticulation in the model. A robust resynthesis procedure,capable of animating facial geometries that differ in shapefrom the measured subject, is described.

To drive articulation from symbolic (phonetic) input, forexample in the context of a text-to-speech system, bothrule-based and data-driven articulatory control models havebeen developed. The rule-based model effectively handlesforward and backward coarticulation by targetunder-specification, while the data-driven model uses ANNs toestimate articulatory parameter trajectories, trained ontrajectories resynthesised from optical measurements. Thearticulatory control models are evaluated and compared againstother data-driven models trained on the same data. Experimentswith ANNs for driving the articulation of a talking headdirectly from acoustic speech input are also reported.

A flexible strategy for generation of non-verbal facialgestures is presented. It is based on a gesture libraryorganised by communicative function, where each function hasmultiple alternative realisations. The gestures can be used tosignal e.g. turn-taking, back-channelling and prominence whenthe talking head is employed as output channel in a spokendialogue system. A device independent XML-based formalism fornon-verbal and verbal output in multimodal dialogue systems isproposed, and it is described how the output specification isinterpreted in the context of a talking head and converted intofacial animation using the gesture library.

Through a series of audiovisual perceptual experiments withnoise-degraded audio, it is demonstrated that the animatedtalking head provides significantly increased intelligibilityover the audio-only case, in some cases not significantly belowthat provided by a natural face.

Finally, several projects and applications are presented,where the described talking head technology has beensuccessfully employed. Four different multimodal spokendialogue systems are outlined, and the role of the talkingheads in each of the systems is discussed. A telecommunicationapplication where the talking head functions as an aid forhearing-impaired users is also described, as well as a speechtraining application where talking heads and languagetechnology are used with the purpose of improving speechproduction in profoundly deaf children.

This paper deals with the problem of modelling the dynamics of articulation for a parameterised talkinghead based on phonetic input. Four different models are implemented and trained to reproduce the articulatorypatterns of a real speaker, based on a corpus of optical measurements. Two of the models, (“Cohen-Massaro”and “O¨ hman”) are based on coarticulation models from speech production theory and two are based on artificialneural networks, one of which is specially intended for streaming real-time applications. The different models areevaluated through comparison between predicted and measured trajectories, which shows that the Cohen-Massaromodel produces trajectories that best matches the measurements. A perceptual intelligibility experiment is alsocarried out, where the four data-driven models are compared against a rule-based model as well as an audio-alonecondition. Results show that all models give significantly increased speech intelligibility over the audio-alone case,with the rule-based model yielding highest intelligibility score.

This paper reports the results of a preliminary cross-evaluation experiment run in the framework of the European research project PF-Star(1), with the double I aim of evaluating the possibility of exchanging FAP data between the involved sites and assessing the-adequacy of the emotional facial gestures performed by talking heads. The results provide initial insights in the way people belonging to various cultures-react to natural and synthetic facial expressions produced in different cultural settings, and in the potentials and limits of FAP data exchange.

We present our current state of development regarding animated agents applicable to affective dialogue systems. A new set of tools are under development to support the creation of animated characters compatible with the MPEG-4 facial animation standard. Furthermore, we have collected a multimodal expressive speech database including video, audio and 3D point motion registration. One of the objectives of collecting the database is to examine how emotional expression influences articulatory patterns, to be able to model this in our agents. Analysis of the 3D data shows for example that variation in mouth width due to expression greatly exceeds that due to vowel quality.

16.

Beskow, Jonas

et al.

KTH, Tidigare Institutioner, Tal, musik och hörsel.

Cerrato, Loredana

KTH, Tidigare Institutioner, Tal, musik och hörsel.

Granström, Björn

KTH, Tidigare Institutioner, Tal, musik och hörsel.

House, David

KTH, Tidigare Institutioner, Tal, musik och hörsel.

Nordstrand, Magnus

KTH, Tidigare Institutioner, Tal, musik och hörsel.

Svanfeldt, Gunilla

KTH, Tidigare Institutioner, Tal, musik och hörsel.

The Swedish PFs-Star Multimodal Corpora2004Inngår i: Proceedings of LREC Workshop on Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces, 2004, s. 34-37Konferansepaper (Fagfellevurdert)

Abstract [en]

The aim of this paper is to present the multimodal speech corpora collected at KTH, in the framework of the European project PF-Star, and discuss some of the issues related to the analysis and implementation of human communicative and emotional visual correlates of speech in synthetic conversational agents. Two multimodal speech corpora have been collected by means of an opto-electronic system, which allows capturing the dynamics of emotional facial expressions with very high precision. The data has been evaluated through a classification test and the results show promising identification rates for the different acted emotions. These multimodal speech corpora will truly represent a valuable source to get more knowledge about how speech articulation and communicative gestures are affected by the expression of emotions.

Simultaneous measurements of tongue and facial motion,using a combination of electromagnetic articulography(EMA) and optical motion tracking, are analysed to improvethe articulation of an animated talking head and toinvestigate the correlation between facial and vocal tractmovement. The recorded material consists of VCV andCVC words and 270 short everyday sentences spoken byone Swedish subject. The recorded articulatory movementsare re-synthesised by a parametrically controlled 3D modelof the face and tongue, using a procedure involvingminimisation of the error between measurement and model.Using linear estimators, tongue data is predicted from theface and vice versa, and the correlation betweenmeasurement and prediction is computed.

SYNFACE is a telephone aid for hearing-impaired people that shows the lip movements of the speaker at the other telephone synchronised with the speech. The SYNFACE system consists of a speech recogniser that recognises the incoming speech and a synthetic talking head. The output from the recogniser is used to control the articulatory movements of the synthetic head. SYNFACE prototype systems exist for three languages: Dutch, English and Swedish and the first user trials have just started.

This article presents an overview of the research activities carried out in the European CAVE project, which focused on text-dependent speaker verification on the telephone network using whole word Hidden Markov Models. It documents in detail various aspects of the technology and the methodology used within the project. In particular, it addresses the issue of model estimation in the context of limited enrollment data and the problem of a posteriori decision threshold setting. Experiments are carried out on the realistic telephone speech database SESP. State-of-the-art performance levers are obtained, which validates the technical approaches developed and assessed during the project as well as the working infrastructure which facilitated cooperation between the partners.

A recently developed application of Director Musices (DM) is presented. The DM is a rule-based software tool for automatic music performance developed at the Speech Music and Hearing Dept. at the Royal Institute of Technology, Stockholm. It is written in Common Lisp and is available both for Windows and Macintosh. It is demonstrated that particular combinations of rules defined in the DM can be used for synthesizing performances that differ in emotional quality. Different performances of two pieces of music were synthesized so as to elicit listeners’ associations to six different emotions (fear, anger, happiness, sadness, tenderness, and solemnity). Performance rules and their parameters were selected so as to match previous findings about emotional aspects of music performance. Variations of the performance variables IOI (Inter-Onset Interval), OOI (Offset-Onset Interval) and L (Sound Level) are presented for each rule-setup. In a forced-choice listening test 20 listeners were asked to classify the performances with respect to emotions. The results showed that the listeners, with very few exceptions, recognized the intended emotions correctly. This shows that a proper selection of rules and rule parameters in DM can indeed produce a wide variety of meaningful, emotional performances, even extending the scope of the original rule definition

This article briefly summarises the author's research on automatic performance, started at CSC (Centro di Sonologia Computazionale, University of Padua) and continued at TMH-KTH (Speech, Music Hearing Department at the Royal Institute of Technology, Stockholm). The focus is on the evolution of the architecture of an artificial neural networks (ANNs) framework, from the first simple model, able to learn the KTH performance rules, to the final one, that accurately simulates the style of a real pianist performer, including time and loudness deviations. The task was to analyse and synthesise the performance process of a professional pianist, playing on a Disklavier. An automatic analysis extracts all performance parameters of the pianist, starting from the KTH rule system. The system possesses good generalisation properties: applying the same ANN, it is possible to perform different scores in the performing style used for the training of the networks. Brief descriptions of the program Melodia and of the two Java applets Japer and Jalisper are given in the Appendix. In Melodia, developed at the CSC, the user can run either rules or ANNs, and study their different effects. Japer and Jalisper, developed at TMH, implement in real time on the web the performance rules developed at TMH plus new features achieved by using ANNs.

This dissertation presents research in the field ofautomatic music performance with a special focus on piano.

A system is proposed for automatic music performance, basedon artificial neural networks (ANNs). A complex,ecological-predictive ANN was designed thatlistensto the last played note,predictsthe performance of the next note,looksthree notes ahead in the score, and plays thecurrent tone. This system was able to learn a professionalpianist's performance style at the structural micro-level. In alistening test, performances by the ANN were judged clearlybetter than deadpan performances and slightly better thanperformances obtained with generative rules.

The behavior of an ANN was compared with that of a symbolicrule system with respect to musical punctuation at themicro-level. The rule system mostly gave better results, butsome segmentation principles of an expert musician were onlygeneralized by the ANN.

Measurements of professional pianists' performances revealedinteresting properties in the articulation of notes markedstaccatoandlegatoin the score. Performances were recorded on agrand piano connected to a computer.Staccatowas realized by a micropause of about 60% ofthe inter-onset-interval (IOI) whilelegatowas realized by keeping two keys depressedsimultaneously; the relative key overlap time was dependent ofIOI: the larger the IOI, the shorter the relative overlap. Themagnitudes of these effects changed with the pianists' coloringof their performances and with the pitch contour. Theseregularities were modeled in a set of rules for articulation inautomatic piano music performance.

Emotional coloring of performances was realized by means ofmacro-rules implemented in the Director Musices performancesystem. These macro-rules are groups of rules that werecombined such that they reflected previous observations onmusical expression of specific emotions. Six emotions weresimulated. A listening test revealed that listeners were ableto recognize the intended emotional colorings.

In addition, some possible future applications are discussedin the fields of automatic music performance, music education,automatic music analysis, virtual reality and soundsynthesis.

Articulation strategies applied by pianists in expressive performances of the same core are analysed. Measurements of key overlap time and its relation to the inter-onset-interval are collected for notes marked legato and staccato in the first sixteen bars of the Andante movement of W.A. Mozart's Piano Sonata in G major, K 545. Five pianists played the piece nine times. First, they played in a wa that they considered "optimal". In the remaining eight performances they were asked to represent different expressive characters, as specified in terms of different adjectives. Legato,staccato, and repeated notes articulation applied by the right hand were examined by means of statistical analysis. Although the results varied considerably between pianists, some trends could be observed. The pianists generally used similar strategies in the rendering intended to represent different expressive characters. legato was played with a key overlap ratio that depended on the inter-onset-interval (IOI). Staccato tones had approximate duration of 40% of the IOI. Repeated notes were played with a duration of about 60% of the IOI. The results seem useful as a basis for articulation rules in grammars for automatic piano performance.

We propose a music performance tool based on the Java programming language. This software runs in any Java applet viewer (i.e. a WWW browser) and interacts with the local Midi equipment by mean of a multi-task software module for Midi applications (MidiShare). Two main ideas are at the base of our project: one is to realise an easy, intuitive, hardware and software independent tool for performance, and the other is to achieve an easier development of the tool itself. At the moment there are two projects under development: a system based only on a Java applet, called Japer (Java performer), and a hybrid system based on a Java user interface and a Lisp kernel for the development of the performance tools. In this paper, the first of the two projects is presented.

Recent research on the analysis and synthesis of music performance has resulted in tools for the control of the expressive content in automatic music performance [1]. These results can be relevant for applications other than performance of music by a computer. In this work it is presented how the techniques for enhancing the expressive character in music performance can be used also in the design of sound logos, in the control of synthesis algorithms, and for achieving better ringing tones in mobile phones.

The control of sound synthesis is a well-known problem. This is particularly true if the sounds are generated with physical modeling techniques that typically need specification of numerous control parameters. In the present work outcomes from studies on automatic music performance are used for tackling this problem.

Director Musices is a program that transforms notated scores into musical performances. It implements the performance rules emerging from research projects at the Royal Institute of Technology (KTH). Rules in the program model performance aspects such as phrasing, articulation, and intonation, and they operate on performance variables such as tone, inter-onset duration, amplitude, and pitch. By manipulating rule parameters, the user can act as a metaperformer controlling different feature of the performance, leaving the technical execution to the computer. Different interpretations of the same piece can easily be obtained. Features of Director Musices include MIDI file input and output, rule palettes, graphical display of all performance variables (along with the notation), and userdefined performance rules. The program is implemented in Common Lisp and is available free as a stand-alone application both for Macintosh and Windows platforms. Further information, including music examples, publications, and the program itself, is located online at http://www.speech.kth.se/music/performance. This paper is a revised and updated version of a previous paper published in the Computer Music Journal in year 2000 that was mainly written by Anders Friberg (Friberg, Colombo, Frydén and Sundberg, 2000).

The Expressive Director is a system allowing real-time control of music performance synthesis, in particular regarding expressive and emotional aspects. It allows a user to interact in real time, for example, changing the emotional intent from happy to sad or from a romantic expressive style to a neutral while it is playing. The Expressive Director was designed in order to merge the expressiveness model developed at CSC and at KTH. The control of the synthesis can be obtained using a two-dimensional space (called “Control Space”) in which the mouse pointer can be moved by the user from an expressive intention to another continuously. Depending on the position, the system applies suitable expressive deviations profiles. The Control Space can be made so as to represent the Valence-Arousal space from music psychology research.

Speech and music performance are two important systems for interhuman communication by means of acoustic signals. These signals must be adapted to the human perceptual and cognitive systems. Hence a comparitive analysis of speech and music performances is likely to shed light on these systems, particularly regarding basic requirements for acoustic communication. Two computer programs are compared, one for text-to-speech conversion and one for note-to-tone conversion. Similarities are found in the need for placing emphasis on unexpected elements, for increasing the dissimilarities between different categories, and for flagging structural constituents. Similarities are also found in the code chosen for conveying this information, e.g. emphasis by lengthening and constituent marking by final lengthening.

We report on our recent facial animation work to improve the realism and accuracy of visual speech synthesis. The general approach is to use both staticand dynamic observations of natural speech to guidethe facial modeling. One current goal is to model the internal articulators of a highly realistic palate, teeth, and an improved tongue. Because our talkinghead can be made transparent, we can provide ananatomically valid and pedagogically useful displaythat can be used in speech training of children withhearing loss [1]. High-resolution models of palateand teeth [2] were reduced to a relatively smallnumber of polygons for real-time animation [3]. Forthe improved tongue, we are using 3D ultrasound data and electropalatography (EPG) [4] with errorminimization algorithms to educate our parametricB-spline based tongue model to simulate realisticspeech. In addition, a high-speed algorithm has beendeveloped for detection and correction of collisions, to prevent the tongue from protruding through the palate and teeth, and to enable the real-time displayof synthetic EPG patterns.

Music has an intimate relationship with motion in several aspects. Obviously, movements are required to play an instrument but musicians move also their bodies in a way not directly related to note production. In order to explore to what extent emotional intentions can be conveyed through musicians’ movements only, video recordings of a marimba player performing the same piece with the intentions Happy, Sad, Angry and Fearful, were recorded. 20 observers watched the video clips, without sound, and rated both the perceived emotional content as well as movement cues. The videos were presented in four viewing conditions, showing different parts of the player. The observers’ ratings for the intended emotions showed that the intentions Happiness, Sadness and Anger were well communicated, while Fear was not. The identification of the intended emotion was only slightly influenced by the viewing condition, although in some cases the head was important. The movement ratings indicate that there are cues that the observer use to distinguish between intentions, similar to the cues found for audio signals in music performance. Anger was characterized by large, fast, uneven, and jerky movements; Happy by large and somewhat fast movements, Sadness by small, slow, even and smooth movements.

Four percussion players’ strategies for performing an accented stroke were studied by capturing movement trajectories.The players played on a force plate with markers on the drumstick, hand, and lower and upper arm. Therhythmic pattern – an ostinato with interleaved accents every fourth stroke – was performed at different dynamiclevels, tempi and on different striking surfaces attached to the force plate. The analysis displayed differencesbetween the movement trajectories for the four players, which were maintained consistently during all playingconditions. The characteristics of the players’ individual movement patterns were observed to correspond wellwith the striking velocities and timing in performance. The most influential parameter on the movement patternswas the dynamic level with increasing preparatory heights and striking velocity for increasing dynamic level. Theinterval beginning with the accented stroke was prolonged, the amount of lengthening decreasing with increasingdynamic level.

The movements and timing when playing an interleaved accent in drumming were studied for three professionals and one amateur. The movement analysis showed that the subjects prepared for the accented stroke by raising the drumstick up to a greater height. The movement strategies used, however, differed widely in appearance.

The timing analysis showed two basic features, a slow change in tempo over a longer time span ("drift"), and a short ter variation between adjacent intervals ("flutter"). Cyclic patterns, with every fourth interval prolonged, could be seen in the flutter. The lengthening of the interval, beginning with the accented stroke, seems to be a common way for the player to give the accent more emphasis. A listening test was performed to investigate if these cyclic patterns conveyed information to a listener about the grouping of the strokes. Listeners identified sequences where the magnitude of the inter-onset interval fluctuations were large during the cyclic patterns.

Musicians often make gestures and move their bodies expressing their musical intention. This visual information provides a separate channel of communication to the listener. In order to explore to what extent emotional intentions can be conveyed through musicians’ movements, video recordings were made of a marimba player performing the same piece with four different intentions, Happy, Sad, Angry and Fearful. Twenty subjects were asked to rate the silent video clips with respect to perceived emotional content and movement qualities. The video clips were presented in different viewing conditions, showing different parts of the player. The results showed that the intentions Happiness, Sadness and Anger were well communicated, while Fear was not. The identification of the intended emotion was only slightly influenced by viewing condition. The movement ratings indicated that there were cues that the observers used to distinguish between intentions, similar to cues found for audio signals in music performance.

In this paper, an overview of the Higgins project and the research within the project is presented. The project incorporates studies of error handling for spoken dialogue systems on several levels, from processing to dialogue level. A domain in which a range of different error types can be studied has been chosen: pedestrian navigation and guiding. Several data collections within Higgins have been analysed along with data from Higgins' predecessor, the AdApt system. The error handling research issues in the project are presented in light of these analyses.

A three-dimensional (3D) tongue model has been developed using MR images of a reference subject producing 44 artificially sustained Swedish articulations. Based on the difference in tongue shape between the articulations and a reference, the six linear parameters jaw height, tongue body, tongue dorsum, tongue tip, tongue advance and tongue width were determined using an ordered linear factor analysis controlled by articulatory measures. The first five factors explained 88% of the tongue data variance in the midsagittal plane and 78% in the 3D analysis. The six-parameter model is able to reconstruct the modelled articulations with an overall mean reconstruction error of 0.13 cm, and it specifically handles lateral differences and asymmetries in tongue shape. In order to correct articulations that were hyperarticulated due to the artificial sustaining in the magnetic resonance imaging (MRI) acquisition, the parameter values in the tongue model were readjusted based on a comparison of virtual and natural linguopalatal contact patterns, collected with electropalatography (EPG). Electromagnetic articulography (EMA) data was collected to control the kinematics of the tongue model for vowel-fricative sequences and an algorithm to handle surface contacts has been implemented, preventing the tongue from protruding through the palate and teeth.