Abstract:Perceived speech quality is most directly measured by subjective listening tests. These tests are often slow and expensive, and numerous attempts have been made to supplement them with objective estimators of perceived speech quality. These attempts have found limited success, primarily in analog and higher–rate, error–free digital environments where speech waveforms are preserved or nearly preserved. How to objectively measure the perceived quality of highly compressed digital speech, possibly with bit errors or frame erasures, has remained an open question. We describe a new approach to this problem, using a simple but effective perceptual transformation, and a hierarchy of measuring normalizing blocks to compare perceptually transformed speech signals. The resulting estimates of perceived speech quality were correlated with the results of nine subjective listening tests. Together, these tests include 219 4–kHz bandwidth speech encoders/decoders, transmission systems, and reference conditions, with bit rates ranging from 2.4–64 kb/s. When compared with six other estimators, significant improvements were seen in many cases, particularly at lower bit rates, and when bit errors or frame erasures were present. These hierarchical structures of measuring normalizing blocks, or other structures of measuring normalizing blocks, may also address open issues in perceived audio quality estimation, layered speech or audio coding, automatic speech or speaker recognition, audio signal enhancement, and other areas.

Disclaimer: Certain commercial equipment, components, and software may be identified in this report to specify adequately the technical aspects of the reported results. In no case does such identification imply recommendation or endorsement by the National Telecommunications and Information Administration, nor does it imply that the equipment or software identified is necessarily the best available for the particular application or uses.