Abstract

Unforeseen user intents can account for a significant portion of unsuccessful calls in an automatic voice response system. Discovering these unforeseen semantic intents usually requires expensive manual transcriptions. We propose a method to cluster the acoustics from logged calls by their estimated semantic intents. This is achieved through training a mixture of language models in an unsupervised manner. Each cluster is presented to the application developer with a suggested language model to cover the semantic intent of the data in that cluster. The application developer validates the cluster and its suggested language model, and then updates the application. A quantative evaluation on a corporate voice-dialer application shows that updating the application in this manner yields a relative 13.4% reduction in semantic error rate.