Topics

Article Structure

Abstract

When a system fails to correctly recognize a voice search query, the user will frequently retry the query, either by repeating it exactly or rephrasing it in an attempt to adapt to the system’s failure.

Introduction

With ever more capable smartphones connecting users to cloud-based computing, voice has been a rapidly growing modality for searching for information online.

Related Work

Previous work in voice-enabled information retrieval has investigated the problem of identifying voice retries, and some has taken the additional step of taking corrective action in instances where the user is thought to be retrying an earlier utterance.

Data and Annotation

Features

Prediction task

Conclusion

We have presented a method for characterizing retries in an unrestricted voice interface to a search system.

Topics

edit distance

Appears in 4 sentences as: edit distance (5)

In Detecting Retries of Voice Search Queries

We calculate the edit distance between the two transcripts at the character and word level, as well as the two most similar phonetic rewrites.

Page 3, “Features”

Of the similarity features, the ones that contributed significantly in the final model were character edit distance (normalized) and phoneme edit distance (raw and normalized); as expected, retries are associated with more similar query pairs.

Page 4, “Prediction task”

T-tests between the two categories showed that all edit distance features—character, word, reduced, and phonetic; raw and normalized—are significantly more similar between retry query pairs.1 Similarly, the number of unigrams the two queries have in common is significantly higher for retries.

Page 4, “Prediction task”

Most notably, all edit distance features are significantly greater for rephrases.

error rate

In particular, we seek to measure and minimize the word error rate (WER) of a system, with a WER of zero indicating perfect transcription.

Page 1, “Introduction”

We do not have retry annotations for this larger set, but we have transcriptions for the first member of each query pair, enabling us to calculate the word error rate (WER) of each query’s recognition hypothesis, and thus obtain ground truth for half of our retry definition.

language model

Retry cases are identified with joint language modeling across multiple transcripts, with the intuition that retry pairs tend to be closely related or exact duplicates.

Page 2, “Related Work”

While we follow this work in our usage of joint language modeling , our application encompasses open domain voice searches and voice actions (such as placing calls), so we cannot use simplifying domain assumptions.

Page 2, “Related Work”

We look at the language model (LM) score and the number of alternate pronunciations of the first query, predicting that a misrecognized query will have a lower LM score and more alternate pronunciations.

Page 4, “Features”

In addition, the language model likelihood for the first query was, as expected, significantly lower for retries.

unigrams

We also count the number of unigrams the two transcripts have in common and the length, absolute and relative, of the longest unigram overlap.

Page 3, “Features”

In addition, we look at the number of characters and unigrams and the audio duration of each query, with the intuition that the length of a query may be correlated with its likelihood of being retried (or a retry).

Page 4, “Features”

T-tests between the two categories showed that all edit distance features—character, word, reduced, and phonetic; raw and normalized—are significantly more similar between retry query pairs.1 Similarly, the number of unigrams the two queries have in common is significantly higher for retries.