Brian Hutchinson

We introduce a novel deep architecture, the Tensor Deep Stacking Network (T-DSN), in which multiple blocks are stacked on top
of another and where a bilinear mapping from hidden representations to the output in each block is used to incorporate
higher-order statistics of the input features. Using an efficient and scalable parallel learning algorithm, we train a T-DSN
to classify standard three-state monophones in the TIMIT database. The T-DSN outperforms an alternative pretrained Deep
Neural Network (DNN) architecture in frame-level classification (both state and phone) and in the cross-entropy measure. For
continuous phonetic recognition, T-DSN performs equivalently to a DNN, without the need for a hard-to-scale fine-tuning step.