19.4. Multimedia Protocols

Up to this point, we've been discussing methods of exchanging
real-time messages in text. There are also real-time messaging
systems that allow the exchange of other kinds of data; these include
Internet telephones, video conferencing systems, and
application-sharing systems. These types of data require a great deal
more bandwidth than plain text and often have more security
implications.

ultimedia protocols tend to have several common characteristics.
First, they normally use more than one port. They use multiple data
streams in order to separate data with different characteristics and
in order to maximize the efficiency with which they use network
resources. Thus, they normally separate audio data from video data
and use different channels for data going in different directions.
They also separate the actual data from administrative commands, so
that the port used to send video is not the same as the port used to
say "Stop sending me video, I can't take it any
more"; this maximizes the chances that the administrative
commands will actually get through. The administrative functions are
normally known as callcontrol.

ost multimedia protocols use different lower-level protocols for
data and for call control. Data is almost always sent over UDP, while
call control is almost always sent over TCP. This is because the data
needs a maximum of speed. It's not important if some packets
are lost, as long as all the packets that get through are used as
soon as they arrive. The call control, on the other hand, happens
less often but must not get lost; it's worth the higher
overhead of TCP in order to be guaranteed that commands will arrive.

ultimedia protocols are very difficult to protect adequately with
firewalls. It would be hard to support any protocol that involved a
large number of channels, going in both directions, and using both
connection-oriented and connectionless protocols, but multimedia
protocols further complicate the picture by requiring very high
performance.

19.4.1. T.120 and H.323

T.120 and H.323[104] are International
Telecommunications Union (ITU) standards for conferencing. T.120
covers file transfer, chat, whiteboard, and application sharing;
H.323 covers audio and video conferencing. These are both
higher-level standards that use a number of lower-level protocols for
various purposes, and you will occasionally hear people talk about
Q.931, G.711, H.245, H.261, and H.263 in particular as parts of
H.323, and T.122 through T.127 as parts of T.120. For most purposes,
you don't need to worry about these lower-level protocols,
which are used in conjunction with the higher-level protocols.

[104]In case you're curious, the
letters "T" and "H" are the designators for
the ITU subcommittees that produced the standard, and subcommittee
designators are just given out in alphabetical order. They're
not short for anything.

Neither the H.323 nor the T.120 standard requires implementors to
provide any security. H.323 is used to carry audio and video data
that will be presented to the user. Although this presents a risk of
information leaks, it's not directly dangerous to the client
except in the ways all protocols are dangerous to clients. Because
H.323 sets up a large number of incoming data channels, both UDP and
TCP, there's a significant risk that allowing H.323 will allow
people to attack other, more vulnerable services.

T.120, on the other hand, is inherently dangerous. Both file transfer
and application sharing are directly attackable applications.

19.4.1.1. Packet filtering characteristics of T.120

When running over TCP/IP, T.120 uses a straightforward TCP connection
on port 1503. (This is actually specified by T.123, which is the
transport standard associated with T.120.)

[105]ACK is not set on the first packet of this type
(establishing connection) but will be set on the rest.

19.4.1.2. Proxying characteristics of T.120

Because T.120 uses a single TCP connection on a well-defined port, it
is quite easy to allow through proxies. However, since T.120 allows
both relatively safe uses (chat and whiteboard) and dangerous uses
(file transfer and application sharing), it would be wise to have a
T.120-aware proxy to enforce some security. Such proxies do not
appear to be available yet.

19.4.1.3. Network address translation characteristics of T.120

T.120 will work transparently with network address translation.

19.4.1.4. Packet filtering characteristics of H.323

H.323 uses at least three ports per connection. A TCP connection at
port 1720 is used for call setup. In addition, each data stream
requires one dynamically allocated TCP port (for call control) and
one dynamically allocated UDP port (for data). Audio and data are
sent separately, and data streams are one-way; this means that a
normal video conference will require no less than eight dynamically
allocated ports (a TCP control port and a UDP data port for outgoing
video, another pair for outgoing audio, another pair for incoming
video, and a final pair for incoming audio). Figure 19-3 shows the connections involved in a generic
H.323 conference. Note that four of the dynamically allocated ports
will be established from the outside to the inside (regardless of
which side initiated the conversation).

[106]ACK is not set on the first packet of this type
(establishing connection) but will be set on the rest.

[107]UDP has no ACK equivalent.

The extensive use of dynamically allocated ports makes H.323 very
hard to deal with via packet filtering; in fact, Microsoft's
instructions for NetMeeting (which is based upon H.323 and mentioned
later) suggest allowing all UDP and TCP connections in either
direction where both ends are above 1024. This configuration is
extremely insecure, and we don't recommend it. However, it is
the only way to allow H.323 through a nonstateful packet filtering
firewall.

A stateful packet filter that can monitor the H.323 port negotiation
would be capable of allowing only the needed data ports. Note that
straightforward tricks like allowing only UDP responses will not work
for H.323 because the incoming data streams from the remote host will
not meet the normal criteria to be considered a response; the packet
filtering must be H.323-aware. Unfortunately, H.323 is not
particularly easy to parse, so H.323-aware packet filters are rare,
although high-end packet filtering systems do offer them.

Because H.323 does not have any built-in authentication, allowing
H.323 through a packet filter is not very secure, even if you use a
dynamic packet filtering system that understands H.323. If you are
concerned about transmitting confidential data, or about the security
of your clients, you would be better off using a proxy that provides
authentication features.

19.4.1.5. Proxying characteristics of H.323

H.323 has almost every characteristic that makes a protocol hard to
proxy; it uses both TCP and UDP, it uses multiple ports, it uses
dynamically allocated ports, it creates connections in both
directions, and it embeds address information inside packets. The
only good news is that the protocol provides a space where clients
can specify a desired destination, making it easy for a proxy to
figure out where connections should be directed.

One way of getting around the problems with proxying H.323 is to use
what the standard calls a Multipoint Control Unit (MCU) and place it
in a publicly accessible part of your network. These systems are
designed primarily to control many-to-many connections, but they do
it by having each person in the conference connect to them. It means
that if you put one on a bastion-host network, you can allow both
internal and external callers to connect to it, and only to it, and
still get conferencing going. If this machine is well configured, it
is relatively safe. However, it's not a true proxy. The
external users have to be able to connect directly to the multipoint
control unit; one multipoint control unit will not connect to
another. The end result is that two sites that both use this
workaround can't talk to each other. It works only if exactly
one site in the conversation uses it. Several systems are available
that provide this functionality, under various names.

It is also possible to get true H.323 proxies, which usually provide
multipoint control and security features as well. In general, these
are special-purpose products, not included with generic proxying
packages. As we've pointed out, proxying H.323 is considerable
work; it's not a minor modification to a normal proxy. However,
vendors like Cisco and Microsoft that offer wide product ranges do
offer H.323 proxying as part of specialized video conferencing
products.

19.4.1.6. Network address translation characteristics of H.323

Because H.323 uses embedded IP addresses to set up the
server-to-client connections, it will not work with straightforward
network address translation. You will need a network address
translator that is H.323-aware. These translators are rare because
the IP address is not embedded in a fixed location; the network
address translator has to actually parse the packets in order to be
able to do the translation. This functionality is included in some of
the H.323 proxies.

19.4.1.7. Summary of recommendations for T.120 and H.323

Do not allow T.120 through your firewall.

Use a special-purpose H.323 proxy that provides security features to
allow H.323.

19.4.2. The Real-Time Transport Protocol (RTP) and the RTP Control Protocol (RTCP)

RTP is an IETF standard for transmitting real-time data (notably,
audio and video). The most common use of RTP is actually as a
lower-level protocol in conjunction with H.323. The standard for RTP
actually details a pair of protocols; RTP transfers data, and RTCP is
the control protocol. Some products that talk about RTP mean RTP in
conjunction with RTCP, while others truly mean that they use RTP
only, using some other protocol for control.

19.4.2.1. Packet filtering characteristics of RTP and RTCP

RTP and RTCP may use any underlying protocol. In TCP/IP
implementations, they are normally UDP-based; they may use any pair
of UDP ports, but RTP is supposed to use an even-numbered port with
RTCP at the next higher port number. If RTP is at an odd-numbered
port, RTCP will use the next lower port number instead, so that they
are always at two successive ports with the lower one being even
numbered. RTP is assigned port number 5004 and RTCP 5005, but they
also often use 24032 and 24033.

[108]Or 24032, or any other port number, preferably
even; see text for further explanation.

[109]UDP has no ACK equivalent.

[110]Or 24033, or any other port number, preferably
odd; see text for further explanation.

19.4.2.2. Proxying characteristics of RTP and RTCP

RTP and RTCP are straightforward protocols, based on UDP. It would
not be particularly difficult for a generic proxy system that
supported UDP to allow them, but dedicated proxies for them are not
widely available.

19.4.2.3. Network address translation of RTP and RTCP

RTCP may contain embedded hostnames and/or IP addresses as part of
the sender description. This is not used to set up the connection but
may reveal information that you wished to conceal. Aside from that,
network address translation does not pose a problem for RTP or RTCP.

19.4.2.4. Summary of recommendations for RTP and RTCP

You are unlikely to encounter RTP and RTCP being used by themselves;
they are normally used in conjunction with other protocols as part of
a larger package. They are not inherently terribly dangerous, so your
approach to them will depend on your approach to the rest of the
package.