H.323 versus SIP: A Comparison

This is, frankly, the best comparison of H.323 and SIP
available anywhere. Virtually all of the others
are misleading, out-of-date, and just plain wrong. To compound the
problem—to further propagate the error, as it were—we have also seen
several papers written by naive students and rank-and-file engineers
that blindly parrot what they have read in these comparisons.
Furthermore, many, many people have formed their opinions of H.323 and
SIP based not on each protocol's merits but solely on the misinformation
provided by these comparisons and through other information provided by
largely the same sources.

To counter this misinformation, we decided to put together this
thorough, up-to-date comparison. As with ours, please consider the
financial interests of the source of any information on this subject, be
it an author, speaker, institution, forum, company, web site, or
conference. Are the people providing information on this issue involved
in both of these—and other—protocols and have nothing besides
perhaps an honest academic interest in one or the other protocol, or have
they otherwise "hitched their wagon" to one?

Like everything else on the web, this is a living document which we will
be updating as the standards evolve. In fact, there is much work in
progress for both H.323 and SIP, but, in order to compare apples to
apples and make this comparison meaningful, we have chosen to focus on
what is currently defined rather than on what might be defined in the
future. Also, note that commentary that is not vital to the main
comparison text appears in a smaller font immediately below it.

H.323

SIP

Philosophy

H.323 was designed with a good understanding of the requirements
for multimedia communication over IP networks, including audio,
video, and data conferencing. It defines an entire, unified system
for performing these functions, leveraging the strengths of the
IETF and
ITU-T protocols.

As a result, it might be reasonable for users to expect about the
same level of robustness and interoperability as is found on the
PSTN today, although this admittedly varies across the globe.

H.323 was designed to scale to add new functionality. The
most widely deployed use of H.323 is "Voice over IP"
followed by "Videoconferencing", both of which are
described in the H.323 specifications.

SIP was designed to setup a "session" between two points
and to be a modular, flexible component of the Internet
architecture. It has a loose concept of a call (that being a
"session" with media streams), and has no intrinsic support
for multipoint multimedia conferencing (though implementers have built
conferencing services to provide conferencing support).

SIP is now
.
H.323 is roughly the same age, but this age is highlighted to draw
attention to the fact that both of these systems are really old.
Though there have been some efforts to create a more modern
communication standards, companies thus far have either elected to
keep using these old systems, create proprietary systems that do not
interwork to a significant degree, or focus effort on web-based
conferencing (i.e., WebRTC) where, unfortunately,
solutions are still proprietary.

Complexity

H.323 is limited to multimedia conferencing, so the complexity of
the system is constrained accordingly. No communication system is
simple, but H.323 attempts to clearly define the basic set of
functionality that all devices must support.

SIP was initially focused on voice communication and then expanded
to include video, application sharing, instant messaging, presence, etc.
With each capability, complexity increases and,
unfortunately, there are no strict guidelines as to what functionality
any given device must support. This leads to more complex systems
with more interoperability problems. SIP was "marketed" as a
simple protocol, in spite of the fact it only looks simple on
the surface. Telephony is a hard problem and, regardless of how
one wants to deliver it, the total system is going to have a certain
level of complexity out of necessity.

Reliability

H.323 has defined a number of features to handle failure of
intermediate network entities, including "alternate
gatekeepers", "alternate endpoints", and a means of
recovering from connection failures.

SIP has not defined procedures for handling device failure. If a
proxy fails, the user agent detects this through timer expiration.
It is the responsibility of the user-agent to send a re-INVITE to
another proxy, leading to long delays in call establishment.

Message Definition

ASN.1,
a standardized, extremely precise, easy-to-understand
structural notation that is used by many other systems.

H.323 encodes messages in a compact binary format that is suitable
for narrowband and broadband connections. Messages are
efficiently encoded and decoded by machines, with decoders widely
available (e.g., Ethereal).

SIP messages are encoded in ASCII text format, suitable for
humans to read. As a consequence, the messages are large and less
suitable for networks where bandwidth, delay, and/or processing
are a concern.

SIP messages get so large that they sometimes
exceed the MTU size when going over WAN links, resulting in
delays, packet loss, etc. As a result, effort has been made
to binary encode SIP (e.g., RFC 3485 and RFC 3486).

H.323 is extended with non-standard features in such a way as to
avoid conflicts between vendors. Globally unique identifiers
prevent feature and data element collision.

SIP is extended by adding new header lines or message bodies that
may be used by different vendors to serve different purposes, thus
risking interoperability problems.

The risk is admittedly small, but this problem has
already been seen in the real world with similar extension
schemes.

Extensibility -Standard

H.323 is extended by the standards community to add new features
to H.323 in such a way as to not impact existing features.
However, new revisions of H.323 are published periodically, which
introduce new functionality that is mandatory, yet done in such a
way as to preserve backward compatibility.

SIP is extended by the standards community to add new features to
SIP in such a way as to not impact existing features. However, new
revisions of SIP are potentially not backward compatible (e.g.,
RFC 3261 was not entirely compatible with RFC 2543). In
addition, several extensions are "mandatory" in some
implementations, which cause interoperability problems.

Scalability -Load Balancing

H.323 has the ability to load balance endpoints across a number of
alternate gatekeepers in order to scale a local point of presence.
In addition, endpoints report their available and total capacity
so that calls going to a set of gateways, for example, may be best
distributed across those gateways.

SIP has no notion of load balancing, except "trial and
error" across pre-provisioned devices or devices learned from
DNS SRV records. There is no means of detecting the load on a
particular gateway or to know whether a device has failed, meaning
that proxies simply have to try a PSTN gateway, wait for the call
to timeout, and then try another.

Scalability -Call Signaling

When an H.323 gatekeeper is used, it may simply provide address
resolution through one RAS message exchange, or it may route all
call signaling traffic. In large networks, the direct call model
may be used so that endpoints connect directly to one another.

When using a SIP proxy to perform address resolution for the SIP
device, the proxy is required to handle at least 3 full message
exchanges for every call. In large networks, such as
IMS networks, the number of messages on the wire may be
excessive. A basic call between two users may require as many as
30 messages on the wire!

Scalability -Statelessness

An H.323 gatekeeper can be stateless using the direct call model.

A SIP proxy can be stateless if it does not fork, use TCP, or use
multicast.

Scalability -Address Resolution

H.323 defines an interface between the endpoint and gatekeeper for
address resolution using ARQ or LRQ. The H.323 gatekeeper may use
any number of protocols to discover the destination address of the
callee, including LRQs to other gatekeepers, Annex
G/H.225.0, TRIP, ENUM,
and/or DNS. The endpoint does not have to be
concerned with the mechanics of this process, and the processing
requirements for address resolution placed on the gatekeeper by
H.323 are for just a single message exchange.

Although out of scope of H.323, an H.323 endpoint may
perform its own address resolution using ENUM and/or
DNS and then place a direct call to the resolved
address or provide the resolved address to the gatekeeper as an
"alias".

While SIP has no address-resolution protocol, per se, a SIP user
agent may route its INVITE message through a proxy or redirect
server in order to resolve addresses. The SIP proxy may use
various protocols to discover the destination address of the
callee, including TRIP,
ENUM, and/or |REFREF|1035||DNS|. The endpoint does
not have to be concerned with the mechanics of this process.
Unfortunately, the processing requirements placed on the SIP proxy
are higher than with H.323 because at least 3 message exchanges
must take place between the SIP device, SIP proxy, and the next
hop.

Although out of scope of SIP, a SIP user agent may perform
its own address resolution using ENUM and/or
DNS and then place a direct call to the resolved
address or through a proxy.

H.323 also supports overlap sending with no additional overhead,
except conveyance of the newly received digits in a single
message.

SIP only understands URI-style addresses. This works fine for
SIP-SIP devices, but causes some confusion when trying to
translated various dialed digits. The unofficial
convention is that a "+" sign is inserted in the SIP URI
(e.g., "sip:+18005551212@example.com") in order to
indicate that the number is in E.164 format, versus a user ID that
might be numeric.

SIP has support for overlapped signaling
defined in RFC 3578, though additional digit received requires
transmission of three messages on the wire (a new INVITE, a 484
response to indicate that the address is incomplete, and an ACK).

Billing

Even with H.323's direct call model, the ability to successfully
bill for the call is not lost because the endpoint reports to the
gatekeeper the beginning and end time of the call via the RAS
protocol. Various pieces of billing information may be
present in the ARQ and DRQ messages at the start and end of the
call.

If the SIP proxy wants to collect billing information, it has no
choice but to stay in the call signaling path for the entire
duration of the call so that it can detect when the call
completes. Even then, the statistics are skewed because the call
signaling may have been delayed. Otherwise, there is no mechanism
in SIP to perform any accounting/billing function.

Most real-world flows are more complex, as they often pass through
one or more proxy devices, have intermediary response messages,
and "negotiate" capabilities through a "trial and
error" process that is far from scientific. Here is a more
real-life SIP call flow.

Capability Negotiation

H.323 entities may exchange capabilities and negotiate which
channels to open, including audio, video, and data channels.
Individual channels may be opened and closed during the call
without disrupting the other channels.

SIP entities have limited means of exchanging capabilities.
RFC 3407 is the state of the art, which is more or less a
"declaration" mechanism, not a negotiation procedure.
The end result is still a "trial and error" approach in
case the called party does not support the proposed media.

Call Forking

H.323 gatekeeper can control the call signaling and may fork the
call to any number of devices simultaneously.

SIP proxies can control the call signaling and may fork the call
to any number of devices simultaneously.

PSTN Interworking

H.323 borrows from traditional PSTN protocols, e.g., Q.931, and is
therefore well suited for PSTN integration. However, H.323 does
not employ the PSTN's circuit-switched technology--like
SIP, H.323 is completely packet-switched. How Media Gateway
Controllers fit into the overall H.323 architecture is
well-defined within the standard.

SIP has no commonality with the PSTN and such signaling must be
"shoe-horned" into SIP. SIP has no architecture that describes
the decomposition of the gateway into the Media Gateway Controller
and the Media Gateways. This has been a recent study of 3GPP and
others in the form of IMS. Presently, there are about 4
"IMS" variants: 3GPP, ITU NGN, 3GPP2, and PacketCable.
Pick the architecture you like best, I suppose.

Services

Services may be provided to the endpoint through a web-browser
interface using HTTP or a feature server using Megaco/H.248. In
addition, services may be provided to an endpoint as it places a
call, as a call arrives, or during the middle of a call by a
gatekeeper or other entity that routes the call signaling. As a
result, H.323 is well-suited to providing new services.

SIP devices can receive service from a SIP proxy as the endpoint
places a call, as a call arrives, or during the middle of a call.
There is no defined way within SIP of providing services via a web
browser or a feature server, as everything is done within the
context of a "session".

One may provide ad-hoc services through other means, such
as XML, SOAP, or CPL. However, there are no standards for
this.

Video and Data Conferencing

H.323 fully supports video and data conferencing. Procedures are
in place to provide control for the conference as well as lip
synchronization of audio and video streams.

SIP has limited support for video and no support for data
conferencing protocols like T.120. SIP has no protocol to
control the conference and there is no mechanism within SIP
for lip synchronization. There is no standard means of recovering
from packet loss in a video stream (to parallel H.323's
"video fast update" command).

Administrative Requirements

H.323 does not require a gatekeeper. A call can be made directly
between two endpoints.

However, most devices do utilize a gatekeeper for the purpose of
registration and address resolution.

SIP does not require a proxy. A call can be made directly between
two user agents.

However, most devices do utilize a SIP proxy
for the purpose of registration, address resolution, and call
routing.

Codecs

H.323 supports any codec, standardized or proprietary. No
registration authority is required to use any codec in H.323.

SIP supports any IANA-registered codec (as a legacy feature) or
other codec whose name is mutually agreed upon.

Firewall/NAT support

Provided by H.323 "proxy" or by the endpoint, both in
conjunction with a gatekeeper residing in the public network.
H.323 also supports direct point-to-point media flows between
devices that are located behind a NAT/FW. Refer to H.460.17,
H.460.18, H.460.19, H.460.23, and H.460.24.

SIP does not define a NAT/FW traversal mechanism, as
this is left to other standard. Some standards that have been
defined or are being defined are STUN,
TURN, ANAT, and ICE.
ANAT is popular as a means of addressing IPv4/IPv6 interworking
and appears to be widely implemented. As of January 2011,
ICE is still not so widely adopted.

Transport protocol

Reliable or unreliable, e.g., TCP or UDP. Most H.323 entities use
a reliable transport for signaling.

Reliable or unreliable, e.g., TCP or UDP. Most SIP entities use an
unreliable transport for signaling.

Loop Detection

Routing gatekeepers can detect loops by looking at the
CallIdentifier and destinationAddress fields in call-processing
messages. If the combination of these matches an existing call, it
is a loop. Infinite loops may be prevented by utilizing the
hopCount field in the SETUP message.

The Via header facilitates this. However, there has been talk
about deprecating Via as a means of loop detection due to its
complexity. Instead, the Max-Forwards header seems to be the
preferred method of limiting hops and therefore loops. In November
2005, a
presentation
was given on issues with max-forwards. So, what is the right solution?

Multicast Signaling

Yes, location requests (LRQ) and auto gatekeeper discovery (GRQ).

Yes, e.g., through group INVITEs.

Third-party Call Control

Yes, through third-party pause and re-routing which is defined
within H.323. More sophisticated control is defined by the
related H.450.x series of standards.

Yes, an MC is required for this, but it could be co-located in a
participating endpoint, or all endpoints could contain an MC. A
stand-alone conference bride may provide this functionality and
H.323 has well-defined procedures for such entities.

What distinguishes H.323 is not that it requires
yet another onerous physical entity for conferencing (it does not)
but that it just has a name for this functionality, an
"MC," and that it provides a flexible means of
implementing that functionality.

No; however, SIP user agents may perform conferencing
themselves. A stand-alone conference bridge may also provide
this functionality.

Original Title

"VISUAL TELEPHONE SYSTEMS AND EQUIPMENT FOR LOCAL AREA NETWORKS WHICH PROVIDE A NON-GUARANTEED QUALITY OF SERVICE"

It is now, "Packet-based multimedia communications
systems."

Despite the word, "VISUAL," in the original
title, H.323 has never described just a videoconferencing
solution--support for video and data has always been optional. And
the reference to LANs may be misleading because H.323 was intended
from the start to support simple and "complex
topologies" and not just single-segment networks, which
"LOCAL AREA NETWORKS" may imply.

Note that the "multimedia conferences" referred to in
the original title are loosely coupled multicast conferences,
à la MBone. This is because SIP was intended to be just a
point-to-point version of SAP and not the
"carrier-class solution addressing a wide area" that
many would have you believe.

Lineage

H.323 is based on H.324, not H.320.
However, H.324 was designed to be a better H.320.

1995 - H.323 working draft circulated.

1996 - H.323 approved.

1998 - H.323v2 approved.

1999 - H.323v3 approved.

2000 - H.323v4 approved.

2003 - H.323v5 approved.

2006 - H.323v6 approved.

SIP is frequently allied with the Internet and the World Wide Web
by way of HTTP.

Yes, via HTTP (Digest and Basic), SSL, PGP, S/MIME, or various
other means.

Encryption

Yes, via H.235 (including use of SRTP, TLS, IPSec, etc.).

Yes, via SSL, PGP, S/MIME, or various other means.

DTMF Carriage

H.245 User Input Indication, RFC 4733, or via the audio stream.
The alphanumeric choice of the H.245 UserInputIndication message is the
baseline carriage common to all H.323 endpoints, so
interoperability is assured.

There is no baseline carriage, which presents issues of
interoperability. Transport of DTMF via the INFO method,
RFC 4733, KPML, or the audio stream are all options.