Abstract

The video coding problem is essentially an operational distortion-rate issue where the underlying input pixel data, probability distributions and dimensions are discrete, unknown and not smooth. In the low bit rate case the high resolution assumptions for vector quantization are not strictly valid and the problem is exacerbated. However, by considering the rate-constrained operational points on sets of self-organizing neural maps (SOMs), provides a methodology for selecting locally optimal vector quantizers. The learning process of the standard SOM algorithm is modified to minimize the distortion subject to a constraint of entropy approximation. The applied training set is adapted to suit the proposed coding environment. Operating in the discrete wavelet transform (DWT) domain is well suited to the inclusion of a psychovisual model. The spatial frequency response, the multiresolution scene analysis and the central focusing aspects of the visual cortex are incorporated into the model. The resulting video coding algorithm is bit rate scalable from 10 k bits per second (bits/s) and provides subjectively acceptable video at a fixed frame rate or 10 frames per second (f.p.s.) with a QCIF pixel resolution

Item Type:

Conference or Workshop Item (Paper)

Additional Information:

The video coding problem is essentially an operational distortion-rate issue where the underlying input pixel data, probability distributions and dimensions are discrete, unknown and not smooth. In the low bit rate case the high resolution assumptions for vector quantization are not strictly valid and the problem is exacerbated. However, by considering the rate-constrained operational points on sets of self-organizing neural maps (SOMs), provides a methodology for selecting locally optimal vector quantizers. The learning process of the standard SOM algorithm is modified to minimize the distortion subject to a constraint of entropy approximation. The applied training set is adapted to suit the proposed coding environment. Operating in the discrete wavelet transform (DWT) domain is well suited to the inclusion of a psychovisual model. The spatial frequency response, the multiresolution scene analysis and the central focusing aspects of the visual cortex are incorporated into the model. The resulting video coding algorithm is bit rate scalable from 10 k bits per second (bits/s) and provides subjectively acceptable video at a fixed frame rate or 10 frames per second (f.p.s.) with a QCIF pixel resolution