An Initial Investigation into the Real-Time Conversion of Facial Surface EMG Signals to Audible Speech

by L. Diener, C. Herff, M. Janke, T. Schultz

Abstract:

This paper presents early-stage results of our investigations into the direct conversion of facial surface electromyographic (EMG) signals into audible speech in a real-time setting, enabling novel avenues for research and system improvement through real-time feedback. The system uses a pipeline approach to enable online acquisition of EMG data, extraction of EMG features, mapping of EMG features to audio features, synthesis of audio waveforms from audio features and output of the audio waveforms via speakers or headphones. Our system allows for performing EMG-to-Speech conversion with low latency and on a continuous stream of EMG data, enabling near instantaneous audio output during audible as well as silent speech production. In this paper, we present an analysis of our systems components for latency incurred, as well as the trade-offs between conversion quality, latency and training duration required.

@INPROCEEDINGS{diener2016initial,
author={Diener, L. and Herff, C. and Janke, M. and Schultz, T.},
booktitle={Engineering in Medicine and Biology Society (EMBC), 2016 38th Annual International Conference of the IEEE},
title={An Initial Investigation into the Real-Time Conversion of Facial Surface EMG Signals to Audible Speech},
year={2016},
url={http://www.csl.uni-bremen.de/cms/images/documents/publications/DienerEMBC_16.pdf},
poster={http://www.csl.uni-bremen.de/cms/images/documents/publications/DienerEMBC_16_poster.pdf},
month={Aug},
abstract={This paper presents early-stage results of our investigations into the direct conversion of facial surface electromyographic (EMG) signals into audible speech in a real-time setting, enabling novel avenues for research and system improvement through real-time feedback. The system uses a pipeline approach to enable online acquisition of EMG data, extraction of EMG features, mapping of EMG features to audio features, synthesis of audio waveforms from audio features and output of the audio waveforms via speakers or headphones. Our system allows for performing EMG-to-Speech conversion with low latency and on a continuous stream of EMG data, enabling near instantaneous audio output during audible as well as silent speech production. In this paper, we present an analysis of our systems components for latency incurred, as well as the trade-offs between conversion quality, latency and training duration required.}
}