Contents

Reconnaissance de la parole par reseaux connexionnistes

This is my first neural network paper.
It describes the application of a time-delay neural network (TDNN) to the recognition of isolated word in speech signal, with performance comparable to LIMSI's state-of-the-art dynamic time warping (DTW) method. Besides describing one of the first subsampled convolutional neural network, this paper describes how to correctly initialize the weights (page 9) and performs data augmentation with elastic time warps (section 3.4). Some of these experiments were carried out while I was visiting Geoff Hinton's lab in Toronto during the summer 88. They had an early Sparc machine that proved very convenient (we only had 68K workstations in France at the time). Note that the contemporary work of Waibel et al. (1988) could only manage the BDG consonants using an Alliant super-computer. The key enabled here was stochastic gradient descent instead of batch gradient. Sometimes it pays to have limited resources