We examine the utility of speech and lexical fea- tures for predicting student emotions in computer- human spoken tutoring dialogues.
We rst anno- tate student turns for negative, neutral, positive and mixed emotions.
We then extract acoustic-prosodic features from the speech signal, and lexical items from the transcribed or recognized speech.
We com- pare the results of machine learning experiments us- ing these features alone or in combination to pre- dict various categorizations of the annotated student emotions.
Our best results yield a 19-36% relative improvement in error reduction over a baseline.
Fi- nally, we compare our results with emotion predic- tion in human-human tutoring dialogues.