This paper investigates the use of speech-to-text methods for assigning an emotion class
to a given speech utterance. Previous work shows that an emotion extracted from text can
convey complementary evidence to the information extracted by classifiers based on
spectral, or other non-linguistic features. As speech-to-text usually presents
significantly more computational effort, in this study we investigate the degree of
speech-to-text accuracy needed for reliable detection of emotions from an automatically
generated transcription of an utterance. We evaluate the use of hypotheses in both
training and testing, and compare several classification approaches on the same task. Our
results show that emotion recognition performance stays roughly constant as long as word
accuracy doesn't fall below a reasonable value, making the use of speech-to-text viable
for training of emotion classifiers based on linguistics.