1. Introduction

AirPlay is a family of protocols implemented by Apple to view various
types of media content on the Apple TV from any iOS device or iTunes.
In this documentation, “iOS device” refers to an iPhone, iPod touch or
iPad. The following scenarios are supported by AirPlay:

Display photos and slideshows from an iOS device.

Stream audio from an iOS device or iTunes.

Display videos from an iOS device or iTunes.

Show the screen content from an iOS device or OS X Mountain Lion.
This is called AirPlay Mirroring. It requires hardware capable of
encoding live video without taking too much CPU, so it is only
available on iPhone 4S, iPad 2, the new iPad, and Macs with Sandy
Bridge CPUs.

Audio streaming is also supported from an iOS device or iTunes to an
AirPort Express base station or a 3rd party AirPlay-enabled audio
device. Initially this was called AirTunes, but it was later renamed
to AirPlay when Apple added video support for the Apple TV.

This document describes these protocols, as implemented in Apple TV
software version 5.0, iOS 5.1 and iTunes 10.6. They are based on
well-known standard networking protocols such as Multicast DNS,
HTTP, RTSP, RTP or NTP, with custom extensions.

All these information have been gathered by using various techniques of
reverse engineering, so they might be somewhat inaccurate and
incomplete. Moreover, this document does not explain how to circumvent
any kind of security implemented by Apple:

It does not give any RSA keys.

It does not explain how to decode iTunes videos protected with the
FairPlay DRM.

It does not explain the FairPlay authentication (SAPv2.5) used by iOS
devices and OS X Mountain Lion to protect audio and screen content.

Please don’t e-mail me about this, I won’t reply. In fact, none of this
is actually required to be able to view media content on Apple TV.

2. Service Discovery

An AirPlay device such as the Apple TV publishes two services. The first
one is RAOP (Remote Audio Output Protocol), used for audio
streaming, and the other one is the AirPlay service, for photo and video
content.

The pw field appears only if the AirPlay server is password protected.
Otherwise it is not included in the TXT record.

The features bitfield allows the following features to be defined:

bit

name

description

0

Video

video supported

1

Photo

photo supported

2

VideoFairPlay

video protected with FairPlay DRM

3

VideoVolumeControl

volume control supported for videos

4

VideoHTTPLiveStreams

http live streaming supported

5

Slideshow

slideshow supported

7

Screen

mirroring supported

8

ScreenRotate

screen rotation supported

9

Audio

audio supported

11

AudioRedundant

audio packet redundancy supported

12

FPSAPv2pt5_AES_GCM

FairPlay secure auth supported

13

PhotoCaching

photo preloading supported

Note that the Apple TV does not support VideoVolumeControl. It has
probably been introduced for the upcoming Apple television.

The AirPlay server is a HTTP server (RFC 2616). Two connections are
made to this server, the second one being used as a reverse HTTP
connection. This allows a client to receive asynchronous events, such
as playback status changes, from a server.

The X-Apple-Purpose header makes it clear that this connection is used
for sending events to the client, whereas X-Apple-Session-ID is used
to link this connection to the other (non-reverse) one. Events are
delivered using a POST request for sending an XML property list to the
/event location.

3. Photos

Photos are JPEG data transmitted using a PUT request to the AirPlay
server. They can be displayed immediately, or cached for future use.

3.1. HTTP requests

GET /slideshow-features

A client can fetch the list of available transitions for slideshows.
Then it can let the user pick one, before starting a slideshow. The
Accept-Language header is used to specify in which language the
transition names should be.

PUT /slideshows/1

Start or stop a slideshow session. When starting, slideshow settings
such as the slide duration and selected transition theme are
transmitted. The following parameters are sent in an XML property list:

3.3. Photo Caching

AirPlay supports preloading picture data to improve transition latency.
This works by preloading a few pictures (most likely the ones before and
after the current picture) just after displaying one.

Preloading is achieved using the cacheOnly asset action. Upon
receiving this request, a server stores the picture in its cache. Later,
a client can request the display of this picture using the
displayCached asset action and the same asset key. This is much faster
than a full picture upload because no additional data is transmitted.

When asked for a picture which is no longer in the cache, a server
replies with an HTTP 412 error code (Precondition Failed).

3.4. Slideshows

Slideshows are using the reverse HTTP connection for asynchronous
loading of pictures. Three connections are performed in parallel. The
X-Apple-Purpose header is set to slideshow. A GET request to the
/slideshows/1/assets/1 location is issued to fetch a new picture from
the AirPlay client. A binary property list with the following parameters
is expected as reply:

POST /play

Start video playback. The body contains the following parameters:

name

type

description

Content-Location

URL

URL for the video

Start-Position

float

starting position between 0 and 1

MP4 movies are supported using progressive download. HTTP Live Streaming
might be supported as well, as indicated by the VideoHTTPLiveStreams
feature flag. The relative starting position, a float value between 0
(beginning) and 1 (end) is used to start playing a video at the exact same
position as it was on the client.

A binary property list can also be used instead of text parameters, with
content type application/x-apple-binary-plist.

Sync packets

Sync packets are sent once per second to the control port. They are used to
correlate the RTP timestamps currently used in the audio stream to the NTP time
used for clock synchronization. Payload type is 84, the Marker bit is always
set and the Extension bit is set on the first packet after RECORD or
FLUSH requests. The SSRC field is not included in the RTP header.

Retransmit packets

AirTunes supports resending audio packets which have been lost. Payload
type is 85 for retransmit queries, the Marker bit is always set and the
SSRC field is not included in the RTP header.

bytes

description

8

RTP header without SSRC

2

sequence number for the first lost packet

2

number of lost packets

Retransmit replies have payload type 86, with a full audio RTP packet
after the sequence number.

Timing packets

Timing packets are used to synchronize a master clock for audio. This is
useful for clock recovery and precise synchronization of several devices
playing the same audio stream.

Timing packets are sent at 3 second intervals. They always have the
Marker bit set, and payload type 82 for queries and 83 for replies.
The SSRC field is not included in the RTP header, so it takes only 8
bytes, followed by three NTP timestamps:

5.4. Metadata

Metadata for the current track are sent using SET_PARAMETER requests.
This allows the Apple TV to show the track name, artist, album, cover
artwork and timeline. The RTP-Info header contains a rtptime parameter
with the RTP timestamp corresponding to the time from which the metadata
is valid.

5.6. Remote Control

Audio speakers can send commands to the AirPlay client to change the
current track, pause and resume playback, shuffle the playlist, and
more. This uses a subset of DACP (Digital Audio Control Protocol).
An AirPlay client advertises this capability by including a DACP-ID
header in its RTSP requests, with a 64-bit ID for the DACP server. An
Active-Remote header is included as well, serving as an authentication
token.

The AirPlay server needs to browse the mDNS _dacp._tcp services for a
matching DACP server. Server names look like iTunes_Ctrl_$ID.

Once the DACP server has been identified, HTTP requests can be sent to the
corresponding service port. The Active-Remote header must be included in
these requests, so no additional pairing is required. The location for
remote control commands is /ctrl-int/1/$CMD. The following commands are
available:

6. Screen Mirroring

Screen mirroring is achieved by transmitting an H.264 encoded video
stream over a TCP connection. This stream is packetized with a 128-byte
header. AAC-ELD audio is sent using the AirTunes protocol. As for the
master clock, it is synchronized using NTP.

Moreover, as soon as a client starts a video playback, a standard
AirPlay connection is made to send the video URL, and mirroring is
stopped. This avoids decoding and re-encoding the video, which would
incur a quality loss.

6.1. HTTP requests

Screen mirroring does not use the standard AirPlay service. Instead it
connects to an apparently hard-coded port 7100. This is a HTTP server
which supports the following requests:

GET /stream.xml

Retrieve information about the server capabilities. The server sends an
XML property list with the following properties:

key

type

value

description

height

integer

720

vertical resolution

width

integer

1280

horizontal resolution

overscanned

boolean

true

is the display overscanned?

refreshRate

real

0.01666…

refresh rate 60 Hz (1/60)

version

string

130.14

server version

These properties tell us that the AirPlay server is connected to a
1280x720, 60 Hz, overscanned display.

POST /stream

Start the live video transmission. The client sends a binary property
list with information about the stream, immediately followed by the
stream itself. At this point, the connection is no longer a valid HTTP
connection.

The following parameters are sent:

key

type

value

description

deviceID

integer

181221086727016

MAC address (A4:D1:D2:80:0B:68)

sessionID

integer

–808788724

session ID (0xcfcadd0c)

version

string

130.16

server version

param1

data

(72 bytes)

AES key, encrypted with FairPlay

param2

data

(16 bytes)

AES initialization vector

latencyMs

integer

90

video latency in ms

fpsInfo

array

timestampInfo

array

The param1 and param2 parameters are optional.

As soon as the server receives a /stream request, it will send NTP
requests to the client on port 7010, which seems hard-coded as well. The
client needs to export its master clock there, which will be used for
audio/video synchronization and clock recovery.

6.2. Stream Packets

The video stream is packetized using 128-byte headers, followed by an
optional payload. Only the first 64 bytes of headers seem to be used.
Headers start with the following little-endian fields:

size

description

4 bytes

payload size

2 bytes

payload type

2 bytes

0x1e if type = 2, else 6

8 bytes

NTP timestamp

There are 3 types of packets:

type

description

0

video bitstream

1

codec data

2

heartbeat

Codec Data

This packet contains the H.264 extra data in avcC format (ISO/IEC
14496:15). It is sent at the beginning of the stream, each time the
video properties might change, when screen orientation changes, and when
the screen is turned on or off.

6.3. Time Synchronization

Time synchronization takes place on UDP ports 7010 (client) and 7011
(server), using the NTP protocol (RFC 5905). The AirPlay server runs
an NTP client. Requests are sent to the AirPlay client at 3 second
intervals. The reference date for the timestamps is the beginning of the
mirroring session.

7. Password Protection

An AirPlay server can require a password for displaying any content from
the network. This is implemented using standard HTTP Digest
Authentication (RFC 2617), over RTSP for AirTunes, and HTTP for
everything else. The digest realms and usernames accepted by Apple TV are
the following: