S2S Informe resumido

Final Activity Report Summary - S2S (Sound to Sense)

Speech is studied by linguists, engineers, psychologists, sociologists and many others. Researchers in one area are often unaware of insights from another, or may lack the technical competence to make full use of them. S2S provided a cross-disciplinary foundation for the young researchers which improved communication between disciplines and reduced fragmentation within them. Complementary methods were integrated in novel ways, resulting in valuable insights.

Deeper insights into the nature of speech communication (an essential part of our humanity) include: challenging the idea that some information is "linguistic" and other is "paralinguistic"; better understanding differences in how native and foreign listeners use the speech signal; and of how native speakers adjust to different needs of listeners e.g. foreigner, or in noisy conditions; a fuller view of differences between languages in how sounds change in connected speech, especially the process of assimilation (common in many languages); showing that intonational detail traditionally seen as irrelevant is actually important in conveying meaning; bringing together theories about short speech units (individual sounds) and long ones (e.g. syllables, intonational phrases); the first computer model to use fine acoustic detail to differentiate syllables that belong to different grammatical categories but whose component sounds have identical formal descriptions i.e. phonemes.

Our audio-visual work likewise deepens understanding of the complex interactions between audio and visual signals, their effect on speech intelligibility and on how conversation is managed. Combining phonetic detail with gestural analysis is novel; it develops more holistic and realistic accounts of human speech communication and informs work on machine speech.

We built or developed speech corpora in 6 languages. 3 were in well-studied languages (Dutch, English, French) and 3 in less-studied minority languages (Czech, Norwegian, Romanian). Good quality language resources are essential for research; many of ours facilitate comparisons between as well as within languages. Many S2S corpora include spontaneous and conversational speech as well as the read/elicited (and often less natural) speech typical of most corpora used by psychologists, linguists and engineers. S2S corpora are available freely or in exchange for similar material.

A new open-source approach to Automatic (computer) Speech Recognition was developed. Existing approaches rely heavily on "top-down" knowledge, i.e. a pre-programmed language model which is used to interpret the acoustic speech signal. This leads to errors if the speech does not fit the language model e.g. if it is ungrammatical or spoken in a foreign accent. The new approach uses the signal and prior knowledge more equally by interpreting the acoustic signal in a more phonetically-informed way, thus reducing reliance on prior top-down knowledge. Machine recognition methods developed for English extended to Romanian.

Much speech research is traditionally done manually, so is slow and expensive. We set up an independent, on-going international working group, Stelaris, to develop tools for automatic larger-scale research in the future. We also developed tools to automate and support research, mostly available online. Some examples are: software to automatically align the acoustic signal with an orthographic transcription for casual speech, with published applications; a toolbox to automatically transcribe speech, with output readable by Praat (popular sound analysis freeware); techniques for visual as well as audio annotation of speech corpora, and to search multi-speaker conversations; a tool for transcribing and annotating multi-speaker conversations; software to automatically identify errors made by an automatic speech recogniser, a manual and tutorial on using the statistical technique Functional Data Analysis on speech-a novel application with published output.