Narrowband vs. Wideband Speech

The quality of today's telephone speech was designed to achieve a sufficient intelligibility. For this reason this is often denominated "conversational quality". The acoustic bandwidth in telephony systems is typically limited to the frequency range between 300 Hz and 3.4 kHz.

However, in certain situations we clearly become aware of the impacts of bandwidth limitation. For example, the limited intelligibility of syllables becomes apparent when we try to understand unknown words or names on the phone. In these cases we often need a spelling alphabet, especially to distinguish between certain unvoiced or plosive utterances, such as /s/ and /f/ or /p/ and /t/. Another drawback is that many speaker-specific characteristics are not retained transparently in the NB speech signal. Therefore, it is sometimes difficult to distinguish on the phone a mother from her daughter. The bandwidth of WB transmission is comparable to that of amplitude modulated (AM) radio transmission, and it allows excellent speech intelligibility and very good speech quality. An example of unvoiced speech with significant frequency content beyond 3.4 kHz is given in the figure which shows a spectral comparison of the original speech with the corresponding NB and WB versions. A closer look at the figure reveals that NB speech may lack significant parts of the spectrum, and that the difference between WB speech and original speech is still noticeable. If, in addition, also these frequency components (beyond 7 kHz) shall be covered, a so called "super-wideband" transmission (e.g., 50 Hz - 15 kHz) becomes necessary.

Basically, the respective algorithms can either be realized stand-alone or with an additional transmission of some some side information.

a) BWE without Side Information

Bandwidth extension without side information (stand-alone) can be realized by employing a statistical estimation scheme. Thereby, certain "features" are extracted from the narrowband speech signal. These features allow, in conjunction with the statistical model, to identify the parameters of a wideband speech production model. These model parameters are typically spectral (or also temporal) envelopes of the speech signal. The respective speech fine structure can either be reproduced from the narrowband signal or completely synthesized.

The performance of a "stand-alone BWE" is naturally limited. Though, considerable speech quality improvements can be obtained with the respective algorithms.

b) BWE with Side Information

Here, the wideband parameters from a) are not (exclusively) estimated, but rather transmitted in coded form. Bandwidth extension with side information is for this reason closely related to so called parametric speech coding. The achievable speech quality is good or even very good. A respective algorithm which has been developed by the IKS has been standardized by the ITU for the VoIP speech codec G.729.1 in spring 2006.

c) BWE with Embedded Side Information

The transmission of additional side information is not compatible with the requirement of backwards compatibility with current communication systems. In order to to retain this compatibility, the IKS examines algorithms which hide this side information in the narrowband speech signal or in the bitstream by using methods of digital watermarking.

A receiver without knowledge of this watermark can still decode the narrowband signal (as case may be with minor quality degradation) while a receiver which is aware of the contained watermark information can produce a wideband signal of considerable quality.

[heese12]Florian Heese, Bernd Geiser, and Peter VaryIntelligibility Assessment of a System for Artifical Bandwidth Extension of Telephone SpeechProceedings of German Annual Conference on Acoustics (DAGA), March 2012