AVT WG P. Zimmermann
Internet-Draft Phil Zimmermann and Associates LLC
Expires: September 6, 2006 A. Johnston, Ed.
SIPStation
J. Callas
PGP Corporation
March 5, 2006
ZRTP: Extensions to RTP for Diffie-Hellman Key Agreement for SRTPdraft-zimmermann-avt-zrtp-01
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on September 6, 2006.
Copyright Notice
Copyright (C) The Internet Society (2006).
Abstract
This document defines ZRTP, RTP (Real-time Transport Protocol) header
extensions for a Diffie-Hellman exchange to agree on a session key
and parameters for establishing Secure RTP (SRTP) sessions. The ZRTP
protocol is completely self-contained in RTP and does not require
support in the signaling protocol or assume a Public Key
Zimmermann, et al. Expires September 6, 2006 [Page 1]

Internet-Draft ZRTP March 20061. Introduction
ZRTP is key agreement protocol which performs Diffie-Hellman key
exchange during call setup in-band in the Real-time Transport
Protocol (RTP) [1] media stream which has been established using some
other signaling protocol such as Session Initiation Protocol (SIP)
[11]. This generates a shared secret which is then used to generate
keys and salt for a Secure RTP (SRTP) [2] session. ZRTP borrows
ideas from PGPfone [7]. A reference implementation of ZRTP is
available as Zfone [8].
The ZRTP protocol has some nice cryptographic features lacking in
many other approaches to media session encryption. Although it uses
a public key algorithm, it does not rely on a public key
infrastructure (PKI). In fact, it does not use persistent public
keys at all. It uses ephemeral Diffie-Hellman (DH) with hash
commitment, and allows the detection of Man in the Middle (MitM)
attacks by displaying a short authentication string for the users to
read and compare over the phone. It has perfect forward secrecy,
meaning the keys are destroyed at the end of the call, which
precludes retroactively compromising the call by future disclosures
of key material. But even if the users are too lazy to bother with
short authentication strings, we still get fairly decent
authentication against a MitM attack, based on a form of key
continuity. It does this by caching some key material to use in the
next call, to be mixed in with the next call's DH shared secret,
giving it key continuity properties analogous to SSH. All this is
done without reliance on a PKI, key certification, trust models,
certificate authorities, or key management complexity that bedevils
the email encryption world. It also does not rely on SIP signaling
for the key management, and in fact does not rely on any servers at
all. It performs its key agreements and key management in a purely
peer-to-peer manner over the RTP packet stream.
Most secure phones rely on a Diffie-Hellman exchange to agree on a
common session key. But since DH is susceptible to a man-in-the-
middle (MitM) attack, it is common practice to provide a way to
authenticate the DH exchange. In some military systems, this is done
by depending on digital signatures backed by a centrally-managed PKI.
A decade of industry experience has shown that deploying centrally
managed PKIs can be a painful and often futile experience. PKIs are
just too messy, and require too much activation energy to get them
started. Setting up a PKI requires somebody to run it, which is not
practical for an equipment provider. A service provider like a
carrier might venture down this path, but even then you have to deal
with cross-carrier authentication, certificate revocation lists, and
other complexities. It is much simpler to avoid PKIs altogether,
especially when developing secure commercial products. It is
Zimmermann, et al. Expires September 6, 2006 [Page 3]

Internet-Draft ZRTP March 2006
therefore more common for commercial secure phones to augment the DH
exchange with a Short Authentication String (SAS) combined with a
hash commitment at the start of the key exchange, to shorten the
length of SAS material that must be read aloud. No PKI is required
for this approach to authenticating the DH exchange. The AT&T 3600,
Eric Blossom's COMSEC secure phones [9], PGPfone [7], and CryptoPhone
[10] are all examples of products that took this simpler lightweight
approach.
The main problem with this approach is inattentive users who may not
execute the voice authentication procedure, or unattended secure
phone calls to answering machines that cannot execute it.
Additionally, some people worry about voice spoofing (the "Rich
Little" attack), and some worry about trying to use it between people
who don't know each other's voices. This is not as much of a problem
as it seems, because it isn't necessary that they recognize each
other by their voice, it's only necessary that they detect that the
voice used for the SAS procedure matches the voice in the rest of the
phone call. These concerns are not enough reason to embrace PKIs as
an alternative, in my opinion.
A popular and field-proven approach is used by SSH (Secure Shell)
[12], which Peter Gutmann likes to call the "baby duck" security
model. SSH establishes a relationship by exchanging public keys in
the initial session, when we assume no attacker is present, and this
makes it possible to authenticate all subsequent sessions. A
successful MitM attacker has to have been present in all sessions all
the way back to the first one, which is assumed to be difficult for
the attacker. All this is accomplished without resorting to a
centrally-managed PKI.
We use an analogous baby duck security model to authenticate the DH
exchange in ZRTP. We don't need to exchange persistent public keys,
we can simply cache a shared secret and re-use it to authenticate a
long series of DH exchanges for secure phone calls over a long period
of time. If we read aloud just one SAS, and then cache a shared
secret for later calls to use for authentication, no new voice
authentication rituals need to be executed. We just have to remember
we did one already.
If we ever lose this cached shared secret, it is no longer available
for authentication of DH exchanges, so we would have to do a new SAS
procedure and start over with a new cached shared secret. Then we
could go back to omitting the voice authentication on later calls.
A particularly compelling reason why this approach is attractive is
that SAS is easiest to implement when a GUI or some sort of display
is available, which raises the question of what to do when no display
Zimmermann, et al. Expires September 6, 2006 [Page 4]

Internet-Draft ZRTP March 2006
is available. We envision some products that implement secure VoIP
via a local network proxy, which lacks a display in many cases. If
we take an approach that greatly reduces the need for a SAS in each
and every call, we can operate in GUI-less products with greater
ease.
It's a good idea to force your opponent to have to solve multiple
problems in order to mount a successful attack. Some examples of
widely differing problems we might like to present him with are:
Stealing a shared secret from one of the parties, being present on
the very first session and every subsequent session to carry out an
active MitM attack, and solving the discrete log problem. We want to
force the opponent to solve more than one of these problems to
succeed.
The protocol can make use different kinds of shared secrets. Each
type of shared secret is determined by a different method. All of
the shared secrets are hashed together to form a session key to
encrypt the call. An attacker must defeat all of the methods in
order to determine the session key.
First, there is the shared secret determined entirely by a Diffie-
Hellman key agreement. It changes with every call, based on random
numbers. An attacker may attempt a classic DH MitM attack on this
secret, but we can protect against this by displaying and reading
aloud a SAS, combined with adding a hash commitment at the beginning
of the DH exchange.
Second, there is an evolving shared secret, or ongoing shared secret
that is automatically changed and refreshed and cached with every new
session. We will call this the cached shared secret, or sometimes
the retained shared secret. Each new image of this ongoing secret is
a non-invertable function of its previous value and the new secret
derived by the new DH agreement. It's possible that no cached shared
secret is available, because there were no previous sessions to
inherit this value from, or because one side loses its cache.
There are other approaches for key agreement for SRTP that compute a
shared secret using information in the signaling. For example, [14]
describes how to carry a MIKEY (Multimedia Internet KEYing) [15]
payload in SDP [16]. Or [13] describes directly carrying SRTP keying
and configuration information in SDP. ZRTP does not rely on the
signaling to compute a shared secret, but If a client does produce a
shared secret via the signaling, and makes it available to the ZRTP
protocol, ZRTP can make use of this shared secret to augment the list
of shared secrets that will be hashed together to form a session key.
This way, any security weaknesses that might compromise the shared
secret contributed by the signaling will not harm the final resulting
Zimmermann, et al. Expires September 6, 2006 [Page 5]

Internet-Draft ZRTP March 2006
session key.
There may also be a static shared secret that the two parties agree
on out-of-band in advance. A hashed passphrase would suffice.
The shared secret provided by the signaling (if available), the
shared secret computed by DH, and the cached shared secret are all
hashed together to compute the session key for a call. If the cached
shared secret is not available, it is omitted from the hash
computation. If the signaling provides no shared secret, it is also
omitted from the hash computation.
No DH MitM attack can succeed if the ongoing shared secret is
available to the two parties, but not to the attacker. This is
because the attacker cannot compute a common session key with either
party without knowing the cached secret component, even if he
correctly executes a classic DH MitM attack. Mixing in the cached
shared secret for the session key calculation allows it to act as an
implicit authenticator to protect the DH exchange, without requiring
additional explicit HMACs to be computed on the DH parameters. If
the cached shared secret is available, a MitM attack would be
instantly detected by the failure to achieve a shared session key,
resulting in undecryptable packets. The protocol can easily detect
this. It would be more accurate to say that the MitM attack is not
merely detected, but thwarted.
When adding the complexity of additional shared secrets beyond the
familiar DH key agreement, we must make sure the lack of availability
of the cached shared secret cannot prevent a call from going through,
and we must also prevent false alarms that claim an attack was
detected.
An added benefit of using these cached shared secrets to mix in with
the session keys is that it augments the entropy of the session key.
Even if limits on the size of the DH exchange produces a session key
with less than 256 bits of real work factor, the added entropy from
the cached shared secret can bring up all the subsequent session keys
to the full 256-bit AES key strength, assuming no attacker was
present in the first call.
We could have authenticated the DH exchange the same way SSH does it,
with digital signatures, caching public keys instead of shared
secrets. But this approach with caching shared secrets seemed a bit
simpler, and has the added benefit of adding more entropy to the
session keys.
The following sections provide an overview of the ZRTP protocol,
describe the key agreement algorithm and RTP header extensions.
Zimmermann, et al. Expires September 6, 2006 [Page 6]

Internet-Draft ZRTP March 20062. Terminology
In this document, the key words "MUST", "MUST NOT", "REQUIRED",
"SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
and "OPTIONAL" are to be interpreted as described in RFC 2119 and
indicate requirement levels for compliant implementations.
3. Protocol Description3.1. Overview
This section provides a description of how ZRTP works. This
description is non-normative in nature but is included to build
understanding of the protocol.
ZRTP is negotiated the same way a conventional RTP session is
negotiated. Using SIP, the AVP/RTP profile is used in SDP. The ZRTP
protocol begins after two endpoints have utilized a signaling
protocol such as SIP and are ready to send or have already begun
sending RTP packets. This specification defines new RTP extension
header which is used to carry the ZRTP messages between the
endpoints. Since RTP endpoints ignore unknown extension headers, the
protocol is fully backwards compatible - a ZRTP endpoint attempting
to perform key agreement with a non-ZRTP endpoint will simply receive
normal RTP responses and can then inform the user that a secure
session is not possible and either continue with the insecure session
or terminate the session depending on the user's security policy.
The ZRTP exchange begins at the same time that the first RTP packets
are exchanged between the endpoints. A ZRTP message can be embedded
in RTP messages containing actual media samples, or they may be sent
in separate RTP messages. For example, if the RTP payload or codec
supports silence or no-op messages, then these can be used for RTP
transport. If none of these are supported, an RTP packet containing
comfort noise can be generated to carry a ZRTP message.
A ZRTP endpoint initiates the exchange by sending a ZRTP Hello
message to the other endpoint. The purpose of the Hello message is
to discover if the other endpoint supports the protocol and to see
what algorithms the two ZRTP endpoints have in common.
The Hello message contains the SRTP configuration options, and the
ZID. Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID
that is generated once at installation time. It is used to look up
retained shared secrets in a local cache. A single global ZID for a
single installation is the simplest way to implement ZIDs, and may be
required in applications where the encryption is being done by a
Zimmermann, et al. Expires September 6, 2006 [Page 7]

Internet-Draft ZRTP March 2006
"bump in the cord" proxy that does not know who is being called.
However, it is specifically not precluded for an implementation to
use multiple ZIDs, up to the limit of a separate one per callee.
This then turns it into a long-lived "association ID" that does not
apply to any other associations between a different pair of parties.
It is a goal of this protocol to permit both options to interoperate
freely.
A response to a ZRTP Hello message is a ZRTP HelloACK message. The
HelloACK message simply acknowledges receipt of the Hello message and
indicates support for the ZRTP protocol. Since RTP uses best effort
UDP transport, ZRTP has retransmission timers in case of lost
datagrams. There are two timers, both with exponential backoff
mechanisms. One timer is used for retransmissions of Hello messages
and the other is used for retransmissions of all other messages after
receipt of a HelloACK which indicates support of ZRTP by the other
endpoint.
After both endpoints exchange Hello and HelloACK messages, the key
agreement exchange can begin with the ZRTP Commit message. An
example call flow is shown in Figure 1 below. Note that the order of
the Hello/HelloACK exchanges in F1/F2 and F3/F4 may be reversed.
Also, an endpoint that receives a Hello message and wishes to
immediately begin the ZRTP key agreement can omit the HelloACK and
send the Commit instead. In Figure 1, this would result in messages
F2, F3, and F4 being omitted. Note that the endpoint which sends the
Commit message is considered the initiator of the ZRTP session and
drives the key agreement exchange.
Zimmermann, et al. Expires September 6, 2006 [Page 8]

Internet-Draft ZRTP March 2006
key type, and sas algorithms are supported. In addition, each
endpoint sends and discovers ZIDs. The received ZID is used to
retrieve previous retained shared secrets, rs1 and rs2. If the
endpoint has other secrets, then they are also collected. The
signaling secret (sigs), is passed from the signaling protocol used
to establish the RTP session. For SIP, it is the dialog identifier
of a Secure SIP (SIPS) session: a string composed of Call-ID, to tag,
and from tag. From the definitions in RFC 3261 [11]:
sigs = hash(call-id | to-tag | from-tag)
Note: the dialog identifier of a non-secure SIP session should not be
considered a signaling secret as it has no confidentiality
protection. For the SRTP secret (srtps), it is the SRTP master key
and salt. This information may have been passed in the signaling
using MIKEY or SDP Security Descriptions, for example:
srtps = hash(SRTP master key | SRTP master salt)
Additional shared secrets can be defined and used as other_secret.
If no secret of a given type is available, a random value is
generated and used for that secret to ensure a mismatch in the hash
comparisons in the DHPart1 and DHPart2 messages. This prevents an
eavesdropper from knowing how many shared secrets are available
between the endpoints.
A Hello message can be sent at any time, but is usually sent at the
start of an RTP session to determine if the other endpoint supports
ZRTP, and also if the SRTP implementations are compatible. A Hello
message is retransmitted using timer T1 and an exponential backoff
mechanism detailed in Section 5 until the receipt of a HelloACK
message or a Commit message.
3.2.2. Hash Commitment
The hash commitment is performed by the initiator of the ZRTP
exchange. From the intersection of the algorithms in the sent and
received Hello messages, the initiator chooses a hash, cipher, public
key type, and sas algorithm to be used.
The key agreement begins with the initiator choosing a fresh random
Diffie-Hellman (DH) secret value (svi) based on the chosen public key
type value, and computing the public value. (Note that to speed up
processing, this computation can be done in advance.) For guidance
on generating random numbers, see the section on Random Number
Generation. The Diffie-Hellman secret value, svi, SHOULD be twice as
long as the AES key length. This means, if AES 128 is used, the DH
secret value SHOULD be 256 bits long. If AES 256 is used, the secret
Zimmermann, et al. Expires September 6, 2006 [Page 10]

Internet-Draft ZRTP March 2006
value SHOULD be 512 bits long.
pvi = g^svi mod p
where g and p are determined by the public key type value, and a
hash, hvi, of the public value using the chosen hash algorithm. The
hvi includes the set of hash, cipher, pkt, and sas types from the
responder's Hello message in the following order:
hvi=hash(pvi | hashr1-5 | cipherr1-5 | pktr1-5 | sasr1-5)
The information from the responder's Hello message is included in the
hash calculation to prevent a bid-down attack by modification of the
responder's Hello message.
Note: If both sides send Commit messages initiating a secure session
at the same time, the Commit message with the lowest hvi value is
discarded and the other side is the initiator. This breaks the tie,
allowing the protocol to proceed from this point with a clear
definition of who is the initiator and who is the responder.
3.2.3. Diffie-Hellman Exchange
The purpose of the Diffie-Hellman exchange is for the two ZRTP
endpoints to generate a new shared secret, s0. In addition, the
endpoints discover if they have any shared secrets in common. If
they do, this exchange allows them to discover how many and agree on
an ordering for them: s1, s2, etc.
3.2.3.1. Responder Behavior
Upon receipt of the Commit message, the responder generates its own
fresh random DH secret value, svr, and computes the public value.
(Note that to speed up processing, this computation can be done in
advance.) For guidance on random number generation, see the section
on Random Number Generation. The Diffie-Hellman secret value, svr,
SHOULD be twice as long as the AES key length. This means, if AES
128 is used, the DH secret value SHOULD be 256 bits long. If AES 256
is used, the secret value SHOULD be 512 bits long.
pvr = g^svr mod p
The final shared secret, s0, is calculated by hashing the
concatenation of the Diffie-Hellman shared secret (DHSS) followed by
the (possibly empty) set of shared secrets that are actually shared
between the initiator and responder. For computing the hash, the
shared secrets are sorted by ascending order of the initiator's
corresponding shared secret IDs. The remainder of this section
Zimmermann, et al. Expires September 6, 2006 [Page 11]

Internet-Draft ZRTP March 2006
describes an algorithm to accomplish this.
First, an HMAC keyed hash is calculated using the first retained
shared secret, rs1, as the key on the string "Responder" which
generates a retained secret ID, rs1IDr, which is truncated to 64
bits. HMACs are calculated in a similar way for additonal shared
secrets:
rs1IDr = HMAC(rs1, "Responder")
rs2IDr = HMAC(rs2, "Responder")
sigsIDr = HMAC(sigs, "Responder")
srtpsIDr = HMAC(srtps, "Responder")
other_secretIDr = HMAC(other_secret, "Responder")
A ZRTP DHPart1 message is generated containing pvr and the set of
keyed hashes (HMACs) derived from the possibly shared secrets.
Upon receipt of the DHPart2 message, the responder checks that the
initiator's public DH value is not equal to 1 or p-1. An attacker
might inject a false DHPart2 packet with a value of 1 or p-1 for
g^svi mod p, which would cause a disastrously weak final DH result to
be computed. If pvi is 1 or p-1, the user should be alerted of the
attack and the protocol must be aborted. Otherwise, the responder
then computes the hash of the public DH value in the DHPart2 with the
hash from the Commit. If they are different (hash(pvi)!= hvi), a
MitM attack is taking place and the user is alerted.
The responder then calculates the Diffie-Hellman result:
DHResult = pvi^svr mod p
The responder then calculates the Diffie-Hellman shared secret:
DHSS = hash(DHResult)
The set of five shared secret IDs received from the DHPart2 message
are stored as set A.
The responder then calculates the set of secret IDs that are expected
to be received from the initiator in the DHPart2 message:
rs1IDi = HMAC(rs1, "Initiator")
rs2IDi = HMAC(rs2, "Initiator")
Zimmermann, et al. Expires September 6, 2006 [Page 12]

Internet-Draft ZRTP March 2006
sigsIDi = HMAC(sigs, "Initiator")
srtpsIDi = HMAC(srtps, "Initiator")
other_secretIDi = HMAC(other_secret, "Initiator")
The set (rs1IDi, rs2IDi, sigsIDi, srtpsIDi, other_secretIDi) is set
B. Set C is the intersection of set A and set B. Set C is then sorted
in ascending numerical order. Set C will contain between zero and
five secret IDs. Set D is then created as the actual secrets
corresponding to the secret IDs in set C in the same order. The set
D is expanded to 5 values by adding in null secrets: s1, s2, s3, s4,
and s5. The final shared secret, s0, is calculated by hashing the
concatenation of the DHSS and the set of non-null shared secrets. As
a result, the null secrets have no effect on the concatenation
operation:
s0 = hash(DHSS | s1 | s2 | s3 | s4 | s5)
3.2.3.2. Initiator Behavior
Upon receipt of the DHPart1 message, the initiator checks that the
responder's public DH value is not equal to 1 or p-1. An attacker
might inject a false DHPart1 packet with a value of 1 or p-1 for
g^svr mod p, which would cause a disastrously weak final DH result to
be computed. If pvr is 1 or p-1, the user should be alerted of the
attack and the protocol must be aborted.
If pvr is not 1 or p-1, the initiator looks up any retained shared
secrets associated with the responder's ZID. The final shared
secret, s0, is calculated by hashing the concatenation of the DHSS
followed by the (possibly empty) set of shared secrets that are
actually shared between the initiator and responder. For computing
the hash, the shared secrets are sorted by ascending order of the
initiator's corresponding shared secret IDs. The remainder of this
section describes an algorithm to accomplish this.
First, an HMAC keyed hash is calculated using the first retained
shared secret, rs1, as the key on the string "Initiator" which
generates a retained secret ID, rs1IDi, which is truncated to 64
bits. HMACs are calculated in a similar way for additional shared
secrets:
rs1IDi = HMAC(rs1, "Initiator")
rs2IDi = HMAC(rs2, "Initiator")
sigsIDi = HMAC(sigs, "Initiator")
Zimmermann, et al. Expires September 6, 2006 [Page 13]

Internet-Draft ZRTP March 2006
srtpsIDi = HMAC(srtps, "Initiator")
other_secretIDi = HMAC(other_secret, "Initiator")
The initiator then sends a DHPart2 message containing the initiator's
public DH value and the set of calculated retained secret IDs.
The initiator calculates the same Diffie-Hellman result using:
DHResult = pvr^svi mod p
The initiator then calculates the DH shared secret using:
DHSS = hash(DHResult)
The set of five shared secret IDs received in the DHPart1 message are
stored as set A.
The initiator then calculates the set of secret IDs that are expected
to be received from the responder in the DHPart1 message:
rs1IDr = HMAC(rs1, "Responder")
rs2IDr = HMAC(rs2, "Responder")
sigsIDr = HMAC(sigs, "Responder")
srtpsIDr = HMAC(srtps, "Responder")
other_secretIDr = HMAC(other_secret, "Responder")
The set (rs1IDr, rs2IDr, sigsIDr, srtpsIDr, other_secretIDr) is B.
Set C is the intersection of set A and set B. Set C will contain
between zero and five secret IDs. Set D is then created as the
actual secrets corresponding to the secret IDs in set C. Set E is the
set of secret IDs that corresponds to the secrets in set D sent in
the DHPart2 message. Set E is then sorted in ascending numerical
order. Set D is then sorted to the same order as the corresponding
secrets in set E.
The set D is expanded to 5 values by adding in null secrets: s1, s2,
s3, s4, and s5. The final shared secret, s0, is calculated by
hashing the concatenation of the DHSS and the set of non-null shared
secrets. As a result, the null secrets have no effect on the
concatenation operation:
s0 = hash(DHSS | s1 | s2 | s3 | s4 | s5)
Zimmermann, et al. Expires September 6, 2006 [Page 14]

Internet-Draft ZRTP March 20063.2.4. Confirmation and Switch to SRTP
The SRTP master key and master salt are then generated using the
shared secret. Separate SRTP keys and salts are used in each
direction for each media stream. Unless otherwise specified, ZRTP
uses SRTP with no MKI, 32 bit authentication using HMAC-SHA1, AES-CM
128 or 256 bit key length, 112 bit session salt key length, 2^48 key
derivation rate, and SRTP prefix length 0.
The ZRTP initiator encrypts and the ZRTP responder decrypts packets
by using srtpkeyi and srtpsalti, which are generated by:
srtpkeyi = HMAC(s0,"Initiator SRTP master key")
srtpsalti = HMAC(s0,"Initiator SRTP master salt")
The ZRTP responder encrypts and the ZRTP initiator decrypts packets
by using srtpkeyr and srtpsaltr, which are generated by:
srtpkeyr = HMAC(s0,"Responder SRTP master key")
srtpsaltr = HMAC(s0,"Responder SRTP master salt")
The HMAC key is generated by:
hmackey = HMAC(s0,"HMAC key")
Both sides now discard the rs2 value and store rs1 as rs2. A new rs1
is calculated from s0:
rs1 = HMAC (s0, "retained secret")
The endpoints can now switch to SRTP and begin packet encryption.
The ZRTP Initiator and Responder use their own keying material for
the SRTP session. No MKI is used and a 32 bit authentication tag is
used.
The ZRTP Confirm1 and Confirm2 messages are sent for two reasons.
First, they confirm that all the key agreement calculations were
successful and the encryption is working, and they enable us to
automatically detect a DH MitM attack from a reckless attacker who
does not know the retained shared secret. Second, they enable us to
transmit the SASflag under cover of SRTP encryption, shielding it
from a passive observer who would like to know if the human users are
in the habit of diligently verifying the SAS.
In the Confirm1 and Confirm2 messages, the sasflag Boolean is
converted to an octet called sasflagoctet (resulting in either 0x00
Zimmermann, et al. Expires September 6, 2006 [Page 15]

Internet-Draft ZRTP March 2006
or 0x01). Confirm1 and Confirm2 messages contain an HMAC of some
known plaintext and the sasflagoctet. The HMAC is explicitly
included in the payload because we may not always be able to rely on
the built-in authentication tag in SRTP, which might be configured to
different sizes, including none.
hmac = HMAC(hmackey, "known plaintext" | sasflagoctet )
This information is not carried in the extension header but inserted
at the start of the SRTP payload.
The Comfirm2ACK message completes the exchange.
The optional GoClear message is used to switch from SRTP back to RTP.
To avoid relying on the optional SRTP authentication tag, the GoClear
contains an HMAC of the string "GoClear" computed with the hmackey
derived from the shared secret:
clear_hmac = HMAC(hmackey, "GoClear")
A GoClear message receives either a ClearACK message or an Error
message, which indicates that the ZRTP endpoint does not support the
GoClear mechanism or that the GoClear has failed authentication (the
clear_hmac does not validate).
3.3. Random Number Generation
The ZRTP protocol uses random numbers for cryptographic key material,
notably for the DH secret exponents, which must be freshly generated
with each session. Whenever a random number is needed, all of the
following criteria must be satisfied:
It MUST be derived from a physical entropy source, such as RF noise,
acoustic noise, thermal noise, high resolution timings of
environmental events, or other unpredictable physical sources of
entropy. Chapter 10 of [4] gives a detailed explanation of
cryptographic grade random numbers and provides guidance for
collecting suitable entropy. The raw entropy must be distilled and
processed through a deterministic random bit generator (DRBG).
Examples of DRBGs may be found in NIST SP 800-90 [5], and in [4].
It MUST be freshly generated, meaning that it must not have been used
in a previous calculation.
It MUST be greater than or equal to two, and less than or equal to
2^L - 1, where L is the number of random bits required.
It MUST be chosen with equal probability from the entire available
Zimmermann, et al. Expires September 6, 2006 [Page 16]

Internet-Draft ZRTP March 2006
number space, e.g., [2, 2^L - 1].
4. RTP Header Extensions
This specification defines a new RTP header extension used for all
ZRTP messages. When used, the X bit is set in the RTP header to
indicate the presence of the RTP header extension.
Section 5.3.1 in RFC 3550 defines the format of an RTP Header
extension. The Header extension is appended to the RTP header. The
first 16 bits are an identifier for the header extension, and the
following 16 bits are length of the extension header in 32 bit words.
All word lengths referenced in this specification follow RFC 3550 and
are 32 bits or 4 octets. All integer fields are carried in network
byte order, that is, most significant byte (octet) first, commonly
known as big-endian. Each ZRTP message is carried in a single RTP
header extension which is the value of 0x505A.
4.1. ZRTP Message Formats
ZRTP messages are designed to simplify endpoint parsing requirements
and to reduce the opportunities for buffer overflow attacks (a good
goal of any security extension should be to not introduce new attack
vectors...)
ZRTP uses 8 octet blocks (2 words) to encode many ZRTP parameters.
These fixed-length blocks are used for Message Type, Hash Type,
Cipher Type, and Public Key Type. The values in the blocks are ASCII
strings which are extended with spaces (0x20) to make them 8
characters long. Currently defined block values are listed in Tables
1-4 below. Additional block values may be defined and used.
ZRTP uses this ASCII encoding to simplify debugging and make it
"ethereal friendly".
4.1.1. Message Type Block
Currently eleven Message Type Blocks are defined - they represent the
set of ZRTP message primitives. ZRTP endpoints MUST support the
Hello, HelloACK, Commit, DHPart1, DHPart2, Confirm1, Confirm2,
Conf2ACK, and Error block types. They MAY support GoClear and
ClearACK.
Zimmermann, et al. Expires September 6, 2006 [Page 17]

Internet-Draft ZRTP March 2006
Table 2. Hash Block Type Values
4.1.3. Cipher Type Block
All ZRTP endpoints MUST support AES128 and MAY support AES256 or
other Cipher Types. Also, if AES 128 is used, DH3k should be used.
If AES 256 is used, DH4k should be used.
Cipher Type Block | Meaning
---------------------------------------------------
AES128 | AES-CM with 128 bit keys
| as defined in RFC 3711
---------------------------------------------------
AES256 | AES-CM with 256 bit keys
| as defined in RFC 3711
---------------------------------------------------
Table 3. Cipher Block Type Values
4.1.4. Public Key Type Block
All ZRTP endpoints MUST support DH3072 and MAY support DH4096. ZRTP
endpoints MUST use the DH generator function g=2. The choice of AES
key length is coupled to the choice of public key type. If AES 128
is chosen, DH3072 SHOULD be used. If AES 256 is chosen, DH4096
SHOULD be used.
Public Key Type Block| Meaning
---------------------------------------------------
DH3072 | DH with p=3072 bit prime
| as defined in RFC 3526
---------------------------------------------------
DH4096 | DH with p=4096 bit prime
| as defined in RFC 3526
---------------------------------------------------
Table 4. Public Key Block Type Values
4.1.5. SAS Type Block
All ZRTP endpoints MAY support the libase32 Short Authentication
String scheme or other SAS schemes. The optional ZRTP SAS is
described in Section 6.
Zimmermann, et al. Expires September 6, 2006 [Page 19]

Internet-Draft ZRTP March 2006
SAS Type Block | Meaning
---------------------------------------------------
libase32 | Short Authentication String using
| libbase32 encoding defined in Section 6.
---------------------------------------------------
Table 5. SAS Block Type Values
4.2. Hello message
The Hello message has the format shown in Figure 2 below. The header
extension payload contains the ZRTP version number and the list of
algorithms supported by SRTP. The extension header field format is
shown in Figure 2.
The Hello ZRTP message begins with the ZRTP header extension field
followed by the 32 bit word count of the header field. Next is a
word containing the version (ver) of ZRTP. For this specification,
the version is the string "0.01". Next is the Client Identifier
string (cid) which is 15 octets long and identifies the vendor and
release of the ZRTP software. The Passive bit (P) is a Boolean
normally set to False. A ZRTP endpoint which is configured to never
initiate secure sessions is regarded as passive, and would set the P
bit to True. Next is a list of supported Hash Types, Cipher Types,
public key types, and SAS Type. Five possible algorithms are listed
for each using the Blocks defined in Tables 2, 3, 4, and 5. If fewer
than five algorithms are supported, spaces (0x20) are used to pad out
the 10 words for each type. The last parameter is the ZID, the 96
bit long unique identifier for the ZRTP endpoint.
Zimmermann, et al. Expires September 6, 2006 [Page 20]

Internet-Draft ZRTP March 2006
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 11. Extension header format for GoClear message
4.12. ClearACK message
The optional ClearACK message is sent to acknowledge receipt of a
GoClear. A ClearACK is only sent if the clear_hmac from the GoClear
message is authenticated. Otherwise, an Error message is returned.
The format is shown in Figure 12 below. A ZRTP endpoint that
receives a GoClear message stops sending SRTP packets, generates a
ClearACK in response, and deletes the crypto context for the SRTP
session. Until confirmation from the user is received (e.g. clicking
a button, pressing a DTMF key, etc.), the ZRTP endpoint MUST NOT
resume sending RTP packets. The endpoint then renders the
information that the media session has switched to clear mode to the
user and waits for confirmation from the user. To prevent pinholes
from closing or NAT bindings from expiring, the ClearACK message
should be resent every 5 seconds while waiting for confirmation from
the user. After confirmation of the notification is received from
the user, the sending of RTP packets may begin.
Note that if the GoClear/ClearACK mechanism is not supported by a
ZRTP endpoint, an Error message MUST be sent in response to a GoClear
message.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=2 words |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block=ClearACK (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 12. Extension header format for ClearACK message
5. Retransmissions
ZRTP uses two retransmission timers T1 and T2. T1 is used for
retransmission of Hello messages, when the support of ZRTP by the
other endpoint may not be known. T2 is used in retransmissions of
all the other ZRTP messages with the exception of GoClear. The
retransmission of GoClear messages is discussed in the section on
GoClear.
Zimmermann, et al. Expires September 6, 2006 [Page 29]

Internet-Draft ZRTP March 2006
Practical experience has shown that RTP packet loss at the start of
an RTP session can be extremely high. Since the entire ZRTP message
exchange occurs during this period, the defined retransmission scheme
is defined to be aggressive. Since ZRTP packets with the exception
of the DHPart1 and DHPart2 messages are small, this should have
minimal effect on overall bandwidth utilization of the media session.
Hello ZRTP requests are retransmitted at an interval that starts at
T1 seconds and doubles after every retransmission, capping at 200ms.
A Hello message is retransmitted 20 times before giving up. T1 has a
recommended value of 50 ms. Retransmission of a Hello ends upon
receipt of a HelloACK or Commit message.
Non-Hello ZRTP requests are retransmitted only by the initiator -
that is, only Commit, DHPart2, and Confirm2 are retransmitted if the
corresponding message from the responder, DHPart1, Confirm1, and
Conf2ACK, are not received. Non-Hello ZRTP messages are
retransmitted at an interval that starts at T2 seconds and doubles
after every retransmission, capping at 600ms. Only the ZRTP
initiator performs retransmissions. Each message is retransmitted 10
times before giving up and resuming a normal RTP session. T2 has a
default value of 150ms. Each message has a response message that
stops retransmissions, as shown in Table 6. The high value of T2
means that retransmissions will likely only occur with packet loss.
The receipt of an Error message ends retransmission of the message
identified in the Error message.
Message Acknowledgement Message
------- -----------------------
Hello HelloACK or Commit
Commit DHPart1
DHPart2 Confirm1
Confirm2 Conf2ACK
GoClear ClearACK
Table 6. Retransmitted ZRTP Messages and Responses
6. Short Authentication String
This section will discuss the implementation of the optional Short
Authentication String, or SAS in ZRTP.
The Short Authentication String (SAS) value is calculated as the hash
of both DH public values and the string "Short Authentication
String".
Zimmermann, et al. Expires September 6, 2006 [Page 30]

Internet-Draft ZRTP March 2006
sasvalue = hash(pvi | pvr | "Short Authentication String")
The rendering of the SAS value depends on the SAS Type agreed upon in
the Commit message. For the SAS Type of libase32, the last 20 bits
of the sasvalue are rendered as a form of base32 encoding known as
libbase32 [6]. The purpose of libbase32 is to represent arbitrary
sequences of octets in a form that is as convenient as possible for
human users to manipulate. As a result, the choice of characters is
slightly different from base32 as defined in RFC 3548. The last 20
bits of the sasvalue results in four libbase32 characters which are
rendered to both ZRTP endpoints. Other SAS Types may be defined to
render the SAS value in other ways.
The sasflag is set based on the user indicating that SAS has been
successfully performed. The sasflag is exchanged securely in the
Confirm1 and Confirm2 messages of the next session. In other words,
each party sends the sasflag from the previous session in the Confirm
message of the current session. It is perfectly reasonable to have a
ZRTP endpoint that never sets the sasflag, because it would require
adding complexity to the user interface to allow the user to set it.
The sasflag is not required to be set, but if it is available to the
client software, it allows for the possibility that the client
software could render to the user that the SAS verify procedure was
carried out in a previous session.
Regardless of whether there is a user interface element to allow the
user to set the sasflag, it is worth caching a shared secret, because
doing so reduces opportunities for an attacker in the next call.
If at any time the users carry out the SAS procedure, and it actually
fails to match, then this means there is a very resourceful man in
the middle. If this is the first call, the MitM was there on the
first call, which is impressive enough. If it happens in a later
call, it also means the MitM must also know your cached shared
secret, because you could not have carried out any voice traffic at
all unless the session key was correctly computed and is also known
to the attacker. This implies the MitM must have been present in all
the previous sessions, since the initial establishment of the first
shared secret. This is indeed a resourceful attacker. It also means
that if at any time he ceases his participation as a MitM on one of
your calls, the protocol will detect that the cached shared secret is
no longer valid-- because it was really two different shared secrets
all along, one of them between Alice and the attacker, and the other
between the attacker and Bob. The continuity of the cached shared
secrets make it possible for us to detect the MitM when he inserts
himself into the ongoing relationship, as well as when he leaves.
Also, if the attacker tries to stay with a long lineage of calls, but
fails to execute a DH MitM attack for even one missed call, he is
Zimmermann, et al. Expires September 6, 2006 [Page 31]

Internet-Draft ZRTP March 2006
permanently excluded. He can no longer resynchronize with the chain
of cached shared secrets.
Some sort of user interface element (maybe a checkbox) is needed to
allow the user to tell the software the SAS verify was successful,
causing the software to set the "SAS verified" flag, which (together
with our cached shared secret) obviates the need to perform the SAS
procedure in the next call. An additional user interface element can
be provided to let the user tell the software he detected an actual
SAS mismatch, which indicates a MitM attack. The software can then
take appropriate action, clearing the "SAS verified" flags, and erase
the cached shared secret from this session. It is up to the
implementer to decide if this added user interface complexity is
warranted.
If the SAS matches, it means there is no MitM, which also implies it
is now safe to trust a cached shared secret for later calls. If
inattentive users don't bother to check the SAS, it means we don't
know whether there is or is not a MitM, so even if we do establish a
new cached shared secret, there is a risk that our potential attacker
may have a subsequent opportunity to continue inserting himself in
the call, until we finally get around to checking the SAS. If the
SAS matches, it means no attacker was present for any previous
session since we started propagating cached shared secrets, because
this session and all the previous sessions were also authenticated
with a continuous lineage of shared secrets.
7. IANA Considerations
If an IANA registry for RTP extension headers were defined, then the
value 0x505A would be reserved for ZRTP.
8. Security Considerations
This document is all about securely keying SRTP sessions. As such,
security is discussed in every section. The next version of this
draft will have a summary of those security properties discussed
throughout the document.
9. Acknowledgments
The authors would like to thank Bryce Wilcox for his contributions to
the design of this protocol, and to thank Jon Peterson, Colin Plumb,
and Hal Finney for their helpful comments and suggestions.
Zimmermann, et al. Expires September 6, 2006 [Page 32]

Internet-Draft ZRTP March 200610. Appendix - ZRTP, SIP, and SDP
This section discusses how ZRTP, SIP, and SDP work together.
SIP UAs which support this specification would include the to-be-
defined SDP attribute a=zrtp in their SDP offers and answers. The
presence of this attribute is a hint to another UA that ZRTP is
supported. If a UA supports both ZRTP and another approach to
negotiate an SRTP secret such as [14] or [13] , then the presence of
the a=zrtp attribute is critical. If both UAs support ZRTP, they
will first try ZRTP before attempting SRTP. If only one endpoint
supports ZRTP but both support SRTP, then the other method will be
used instead.
Note that ZRTP may be implemented without coupling with the SIP
signaling. For example, ZRTP can be implemented as a "bump in the
wire" or as a "bump in the stack" in which RTP sent by the SIP UA is
converted to ZRTP. In these cases, the SIP UA will have no knowledge
of ZRTP and will not include the a=zrtp attribute. As a result, even
if the other UA does not indicate support for ZRTP, a ZRTP endpoint
SHOULD still send Hello messages.
11. References11.1. Normative References
[1] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
"RTP: A Transport Protocol for Real-Time Applications", STD 64,
RFC 3550, July 2003.
[2] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)",
RFC 3711, March 2004.
[3] Kivinen, T. and M. Kojo, "More Modular Exponential (MODP)
Diffie-Hellman groups for Internet Key Exchange (IKE)",
RFC 3526, May 2003.
[4] Ferguson, N. and B. Schneier, "Practical Cryptography", Wiley
Publishing 2003.
[5] Barker, E. and J. Kelsey, "Recommendation for Random Number
Generation Using Deterministic Random Bit Generators", NIST
Special Publication 800-90 DRAFT (December 2005).
[6] O'Whielacronx, Z., "human-oriented base-32 encoding", http://cvs.sourceforge.net/viewcvs.py/libbase32/libbase32/Zimmermann, et al. Expires September 6, 2006 [Page 33]

Internet-Draft ZRTP March 2006
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2006). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Zimmermann, et al. Expires September 6, 2006 [Page 36]