Speech and Audio Processing

The Speech and Audio Processing work spreads across a number of fundamental and direct application research areas| including, for example, signal processing for separation, recognition, transcription, enhancement, coding, synthesis as well as applications to advanced fixed and wireless communication systems. The various research activities have received funding support from EPSRC, DSTL, RAENG, BBC, EU, BAE systems, etc.

Speech and audio separation

(Wang and Jackson)

It is well known that humans are generally skilful in isolating speech source of interest from sound mixtures observed in a cocktail party environment where background noise and interfering sounds are present simultaneously. It is however difficult for machines to replicate such capabilities. Solutions to this problem are likely to have impact on hearing aids and cochlear implants, automatic speech recognition in uncontrolled natural environment, advanced human computer interactions, and security and defence related applications. Our efforts in this area have been centred on the development of algorithmic solutions for separation or extraction of speech or music sources from their mixtures, using primarily the techniques of blind source separation, independent component analysis, time-frequency masking, non-negative matrix/tensor factorization, and computational auditory scene analysis. Our interests in this direction are summarised as follows:

Speech and audio recognition

(Jackson and Wang)

Automatic recognition of speech, which aims to recognise from speech recordings what have been said by the speaker, has been studied for many years. Although the state-of-the-art techniques are performing well for clean speech, they are still very limited when presented with noisy speech that contains background noise and interfering sounds. Recent developments in audio recognition shift towards recognition and detection sound events from general sounds, such as environmental sound recognition, and audio event detection from sound mixtures, anomaly event detection from sound recordings (e.g. cough of patients). Applications of this research include human computer interactions, healthcare, etc. Our interests in this area include:• Emotion recognition from speech• Environmental sound recognition• Audio event classification and recognition• Anomaly audio event detection• Voice activity detection

Speech and Audio Processing activities in the I-LAB

I-Lab traditionally has been known as one of the world leading speech coding groups where it contributed to many national and international research and development activities as well as standardisations programmes. These standards are ETSI GSM Full and Half rate speech and channel coding systems, ETSI, 3GPP AMR speech and channel coding system, INMARSAT-M and mini-M, NATO STANAG etc. In addition to speech coding I-LABs research activities now covers Blind Source Separation, Advanced Spatial Audio Coding based on ABS, Speaker recognition systems, Speech and data tunnelling for secure GSM /3G communication systems, simplified WFS audio rendering, audio video synchronisation for P2P and DVB networks which are all leading edge research activities with internationally leading quality and world standing.