Recent years have shown a large scale adoption of speech recognition by the
public, in particular around mobile devices. Google, with its Android
operating system, has integrated speech recognition as a key input modality.
The decade of speech that our recognizer processes each day is a clear
indication of the popularity of this technology with the public. This talk
will describe the current mobile speech applications in more detail. In
particular, it will provide a more detailed description of the Deep Neural
Network (DNN) technology that is used as the acoustic model in this system and
its distributed, asynchronous training infrastructure. Since a DNN is a static
classifier, it is ill matched to the speech recognition sequence
classification problem. The asynchrony that is inherent to our distributed
training infrastructure further complicates the optimization of such models.
Our recent research efforts have focused on the optimization of the DNN model,
matched to the speech recognition problem. This has resulted in three related
algorithmic improvements. First a novel way to bootstrap training of a DNN
model. Second the use a sequence rather than a frame-based optimization
metric. Third, we have succeeded in applying a recurrent neural network
structure to our large scale, large vocabulary application. These novel
algorithms have shown effective even in light of the asynchrony in our
training infrastructure. The algorithms have reduced the error rate of our
system with 10% or more over DNNs well optimized with a frame-based objective.
And this trend is holding across all 48 languages where we support speech
recognition as an input modality.