The H.320 Recommendation Overview

As videoconferencing moves into the mainstream computer
market, the impact of standards and emerging pipelines is
creating a situation ripe for explosive growth in both business
and consumer use of real-time audio and video communications.
Videoconferencing is moving from its status of a vertical niche
application to that of a horizontal enabling technology. One
side of this is the continuing battle between host-based
processing and DSP-based processing for the audio and video
algorithms needed to videoconference. While host processors,
notably the Pentium-class semiconductors, continue to offer
more and more power, the cost of dedicated silicon for
audio/video processing continues to drop. DSP engineers,
videoconferencing designers, and computer OEMs are faced with a
wide variety of hardware/software tradeoffs. The good news is
that products can be tailored easily to meet a corresponding
wide variety of customer price and performance levels.

This article presents H.320, the first ITU recommendation,
officially passed in December 1990 but since modified several
times. H.320 covers audio/video telephony on switched digital
circuits. The most common such services are known as
switched-56 (which has many brand names according to the
different telephone companies), and ISDN, which is most often
used in its basic rate version (BRI) and which most people
think of as a 128 kbps connection.

The Marketing Perspective

When the videoconferencing industry first got started, a
major challenge facing equipment designers was how to compress
audio and video signals into data streams that were small
enough to fit onto affordable telephone networks. Speech and
video signals typicall require from hundreds of thousands to
hundreds of millions of bits per second in order to faithfully
reproduce the original content, while most communications
channels offer but a tiny fraction of this requirement.
Software engineers competed feverishly to develop new
algorithms to maximize audio and video quality and minimize
required bandwidth while hardware designers developed
silicon-based engines which could execute the algorithms in
real-time. The result was a tower of babble: each manufacturer
could talk with only his own equipment because only his own
equipment understood the compression schemes used.

While this situation spurred advances in audio and video
compression technology, the lack of standards hindered market
acceptance. It was not possible to simply make the call; the
user had to assure before hand that the network and receiving
equipment were compatible. A major technology advance in the
early 1980s made it feasible to squeeze real time audiovideo
calls into data streams of approximately 1.5 Mbps, a speed
which matched the capability of common and less expensive T1/E1
digital phone connections. This development sparked the growth
of the videoconferencing industry and fueled additional
interest in a truly international standard for video calls. The
ultimate result was the 1990 adoption by the ITU (the
International Telecommunications Union, then called the CCITT)
of H.320, officially titled "Narrow-band visual telephone
systems and terminal equipment."

What is H.320

H.320 is what is known as an umbrella recommendation; it is
a suite of individual recommendations, each of which covers a
different aspect of communications. In 1990, H.320 consisted of
H.261 for video, G.711 for audio, and three other
recommendations for mixing signals and call control, but the
document is a living document, and other important components
have been added in the intervening years. The most recent
revision of H.320 is dated 3/96.

H.320 applies to videoconferencing calls made over switched
digital telephone networks. The H.320 architecture is based on
communications channels with data throughput rates that are
integer multiples of 64 kbps. This is known as Px64, where P
can be from 1 to 30. Hence, H.320 requires a bandwidth of at
least 64 kbps, and can support networks running all the way up
to 1920 kbps. These 64 kbps chunks exactly match the
capabilities of today's ISDN telephone networks, so H.320 and
ISDN go pretty much hand-in-hand. Because the minimum bandwidth
is 64 kbps, H.320 does not work over the plain old analog
telephone system, a network which "maxes out" at approximately
33.6 kbps, the speeds of today's V.34 analog modems. The
standard for videoconferencing over analog phone lines is
H.324.

H.320 plays a crucial role as the lingua franca of the
videoconferencing industry. Today, virtually all group and room
videoconferencing systems support H.320. Many vendors also
support their own proprietary algorithms, but H.320 guarantees
customers that their equipment will be able to talk with
equipment from any other vendor, or use conferencing services
from any other network or third party provider. All of the
current ISDN-based desktop systems support H.320 as well,
making these lower-cost, personal systems interoperable with
their group system cousins.

The H.320 Recommendation is also flexible. Some of the
component recommendations are required, others are optional,
and within individual recommendations, ony some operating modes
are required. For example, three voice coders with different
bandwidth, quality, and computational complexity are now
included under H.320, though only the original G.711 is
required; and the H.261 video CODEC specifies two fixed image
sizes, though support for only one (known as QCIF) is required.
Many implementation details are left up to individual vendors;
the result is that there is a wide variety of price and quality
points in the market, though the equipment is all
H.320-compliant. When two videoconferencing CODECs are
communicating, the CODEC with the superior performance
automatically drops down to match the capabilities of the
other.

H.320 Benefits

CODEC Standards
H.320 establishes standards for compression/decompression of
audio and video data streams as well as standards for
multiplexing of data streams, call set-up, and call tear
down. By adhering to standards, vendors will ensure that
their equipment will work with that from other vendors.

Vendor-to-Vendor Interoperability
Users want to "make the call" without having to worry about
whether the receiving equipment is from the "right"
manufacturer or will be able to handle the data. Beside
ensuring that data is compressed in a way that a receiver can
decompress it, H.320 establishes methods for receiving
terminal equipment to communicate its capabilities to the
sender equipment.

Desktop-to-Room System Interoperability
Virtually all room systems support H.320, so H.320-compliant
desktop systems will be able to communicate with these
endpoints as well.

Network Independence
H.320 is designed to ride on top of ISDN and to work in
common 128, 384, and 768 kbps configurations.

Platform and Application Independence
H.320 is not tied to any hardware or operating system.

Multipoint Support
Added in 1992, two recommendations standardize the functions
of a multipoint controller unit and the communications
protocols used between them. Multipoint bridges in H.320 can
be cascaded.

The Technical Perspective

In December, 1990 the ITU established standards for
videotelephony over switched digital telephone services when it
ratified the H.320 standard (consisting at the time of H.261
and four other recommendations, but others have been added in
the intervening years), now officially dubbed "Narrow-band
visual telephone systems and terminal equipment." Study Group
XV had worked on the relevant technical problems over a series
of meetings lasting six years. The Recommendation specifies
technical requirements for narrow-band visual telephone systems
and terminal equipment, typically for videoconferencing and
videophone services. It describes a generic system
configuration consisting of a number of elements which are
specified by respective ITU-T Recommendations, definition of
communication modes and terminal types, call control
arrangements, terminal aspects and interworking requirements.
The document is a living document, however, and is revised to
reflect the progress in relevant H-Series Recommendations such
as H.233, H.234, H.242, H.243, H.244, and so on. The most
recent revision of H.320 is dated 3/96.

H.320 is an umbrella recommendation which covers telephony
for channels with data throughput rates that are integer
multiples of 64 kbps (Px64, where P = 1,2,3...30). Hence the
committee was once known as the Px64 committee. This is also
known as narrowband-ISDN, or N-ISDN. Today, virtually all
videoconferencing vendors support H.320 (sometimes in addition
to proprietary algorithms and sometime without proprietary
algorithm options). This has created an environment which
addresses customers' major concern: vendor-to-vendor
interoperability as well as interoperability between desktop
systems and group or conference room systems. H.320 is
broadening the market for videoconferencing over switched
digital networks.

As an umbrella recommendation, H.320 does not really specify
details of coding or control algorithms, but rather references
other ITU recommendations which do this job. H.320 also
specifies which other recommendations and which functions are
required, and which are optional. Many implementation details
are left up to individual vendors and most independent
observers would agree that there is a wide variety of price and
quality points available in the market, though they are all
H.320-compliant.

The H.320 standard includes three levels of compliance:
minimum, optional, and maximum, so not all CODECs are the same.
Competition and differentiation exist. The H.320 minimum level
provides a baseline so that all standards-compliant
videoconferencing systems can communicate with each other. Any
system that meets the minimum level can be touted as H.320
compliant. When two videoconferencing CODECs are communicating,
the CODEC with the superior performance automatically drops
down to match the capabilities of the other, similar to the way
modems establish a handshake and mutually acceptable bit
rate.

Minimum Level

Optional Level

Maximum Level

QCIF (176 x 144)

CIF (352 x 288)

CIF (352 x 288)

7.5 frames/sec

15 frames/sec

30 frames/sec

300-3400 Hz audio

50 - 7000 Hz audio

50 - 7000 Hz audio

56/64 k bps data rate

up to 384 kbps data rate

up to 1.544 Mbps data rate

no motion compensation

limited motion compensation

full motion compensation

no noise reduction

no noise reduction

noise reduction

Table 1: H.320 allows for different levels of
compliance

H.320 Architectural Overview

H.320 was the first true international standard for visual
telephony. The recommendation established the structure and
philosophy for other ITU recommendations covering other
networks.

The recommendation specifies signalling and control
procedures as well as the audio and video standards that any
receiver must be able to decode. Recommendations include both
mandatory and optional capabilities. H.320 has proven itself to
be comprehensive and flexible at the same time. The original
media recommendations were H.261 for video and G.711 for audio.
In the intervening years, T.120 data support as well as G.722
and G.728 audio support and H.263 video CODECs have been added
to the umbrella recommendation. Capabilities are negotiated
during call set up.

The H.320 recommendation includes three standards for audio
coding: G.711, G.722, and G.728. The G.711 and G.722 standards
have been used extensively in Px64 systems operating at high
bit rates such as 1.544 Mbps and 768 kbps. With these high
bandwidths, devoting 64 kbps for audio alone is not a problem.
On the desktop however, where 384 kbps is somewhat common, and
128 kbps and 112 kbps are even more common, 64 kbps for audio
leaves too little for video. G.728, the most recent audio
recommendation, and one which was not available when H.320 was
originally approved, compresses audio to 16 kbps. This solves
the problem, although the compute horsepower to implement G.728
is significantly higher than that needed for G.711 and
G.722.

Audio Bandwidth

Bit Rate (kbps)

Coding Approach

G.711

3 kHz

64

PCM

G.722

7 kHz

64,56 kbps

Sub-band adaptive differential PCM

G.728

3

16 kbps

CELP

Table 3: Audio recommendations included within
H.320

H.320 Video Recommendations

The core of the H.320 standard is the H.261 video
compression recommendation, which presents the general rules
for encoding and decoding digital video information. H.261
requires that CODECs be able to communicate across bandwidths
which may vary from 64 kbps to 2.048 Mbps (in multiples of 64
kbps). Like MPEG, H.261 encoding is DCT-based and calls for
fully-encoding only certain frames. Since a normal video
sequence has little variation from one frame to the next, H.261
encodes only the difference between a frame and the previous
frame. H.261 uses groups of pixels (16x16 macroblocks) to
identify a group in the previous frame that best matches a
group in the current frame, coding the difference along with a
vector that describes the offset of that group. H.261 specifies
two fixed image sizes, either common interchange format (CIF),
which is 352x288, or quarter CIF (QCIF), which is 176x144.

The H.261 CODEC is intended specifically for telephony and
therefore minimizes encoding and decoding delay while achieving
a fixed data rate. H.261 implementations allow a trade-off
between frame rate and picture quality. As the motion content
of the video increases, the CODEC has to do more computations
and usually has to give up on image quality to maintain frame
rate, or the reverse. Furthermore H.320, through H.221,
supports dynamic bandwidth allocation every 50 ms.

A 1995 revision to H.320 adds support for H.263 video, an
algorithm developed as part of the effort to create standards
for videoconferencing over the analog phone system. Details of
H.261 and H.263 are provided elsewhere in this report.

Other H.320 Recommendations

H.221 defines the frame structure by specifying what
information is in a bit stream so each CODEC can keep track of
video frames. In other words, H.221 defines a frame structure
for single or multiple B or H0 channels. It describes the order
in which the bits are woven together and lined up or
multiplexed before they are transmitted. H.221 also describes
how to label the bits of transmitted information as either
audio, visual, or control data. H.221 is often implemented on a
simple controller chip. H.221 specifies synchronous operation;
the coder and decoder handshake and agree upon timing.
Synchronization can be arranged for individual B channels or
for bonded connections. H.221 allows configurations to be
changed at 20 ms intervals. Configurations are signalled by
repeatedly transmitting codewords, not by an acknowledgment
signal. The code is secure, protected by a double-error
correcting code.

H.221 structures a 64 kbps channel into octets of 8 kHz. One
of these channels is called the Service Channel and contains
the frame alignment signal (FAS) which frames the 80 octets of
information in a B channel and the bit-rate allocation signal
(BAS), an 8-bit code which describes the capability of a
terminal. Hence the BAS codes support hundreds (256) of
different configurations and allow a receiver to understand the
information coming across the channel. Each frame of 640 bits
(transmitted 100x per second) loses 8 bits each to the FAS and
BAS channels, so for a single B channel devoted to video, only
62.4 kbps are available for video data.

While H.221 defines the frame structure, H.230 defines how
frames of audio, video, data, and control information are
multiplexed onto a digital channel. H.230 also defines the
control and indication (C&I) symbols related to audio,
video, maintenance, and multipoint conferences. H.230 contains
the table of bit-rate allocation signal (BAS) escape codes that
clarifies the circumstances under which some C&I functions
are mandatory and others optional. C&I bits are woven into
the data stream with the audio, video, and data bits in order
to provide control (the bits actually cause a change to take
place somewhere in the system) and to indicate to the users the
status or condition of some operating element.

H.231 (1992) defines a multipoint control unit (MCU) that
serves as a bridge in multipoint connections. The communication
protocol used between an H.320 (Px64) terminal and an H.231 MCU
is defined in Recommendation H.243. which is referenced in
H.231. H.231 MCUs provide audio mixing and video switching.

H.233 describes the confidentiality part of a privacy system
for H.320, but does not include any of the actual encryption
algorithms.

H.234 describes the authentication and key management
methods for a privacy system.

H.242 deals with call setup and disconnect (including adding
and deleting channels during a call, and call transfers). Like
the modem handshake, it determines the methods by which the two
communicating CODECs tell each other what capabilities they
have so that both sides will independently set themselves in
the same mode. H.242 includes methods for connecting,
disconnecting, and transferring calls, procedures for
activating and deactivating data channels, mode initialization,
dynamic mode switching, and operation of terminals in
restricted networks.

H.243 (1992) defines a system for establishing communication
between three or more audiovisual terminals using digital
channels up to 2 Mbps.

Many vendors of H.320-compliant videoconferencing systems
are moving to incorporate T.120 standards for data sharing.

H.320 Network Considerations

Recommendation H.320, narrow-band visual telephone systems
and terminal equipment, is a standard for switched digital
networks. H.320 runs on connections made with 64 kbps B
(bearer) channel, 16 kbps or 64 kbps D channels, and H
channels, which can be configured to have 384, 1536, or 1920
kbps capacity.

The B channel carries the data and is the basic unit of
circuit switching. So all audio, video, or other data on a
single B channel must be destined for the same end point. The D
channel is the signalling information to control the
circuit-switched call and may also be used for packet-switching
at low speeds. H channels are for higher bit rates. A major use
for H.320 is over primary rate ISDN, which offers the customer
2B+D channels over a single twisted pair from the telco, if the
local loop is qualified.

There have been some reported problems with H.320
videoconferencing relating to the different implementations of
ISDN, particularly in North America. ISDN in North America has
always supported 64 kbit/sec between the subscriber and the
central office switch, however in many locations the
implementation provides 56 kbps. The 56 Kbit/sec problem arose
because in North America, T-carrier systems used something
called robbed-bit signalling (the least sig. bit of every sixth
voice/data sample in each channel) for call supervision. Our
friends in Europe (E-carrier) used a "separate" channel of the
carrier system so they didn't have the problem. As systems in
the U.S. switch over to SS7, the problems are disappearing. In
many locations, where ISDN was not available, carriers offered
Dual Switched 56 service, which required two four-wire circuits
and a pair of CSU/DSU's at the customer premise; in short, a
real hardware kluge. Switched-56 is slowly being phased out in
many U.S. locations and the telcos are encouraging customers to
switch to ISDN instead.