Marker bit (M): 1 bit
Set for the last packet, carried in the current RTP stream, of
the access unit, in line with the normal use of the M bit in
video formats, to allow an efficient playout buffer handling.
When MSM is in use, if an access unit appears in multiple RTP
streams, the marker bit is set on each RTP stream's last
packet of the access unit.
Informative note: The content of a NAL unit does not tell
whether or not the NAL unit is the last NAL unit, in
decoding order, of an access unit. An RTP sender
implementation may obtain this information from the video
encoder. If, however, the implementation cannot obtain
this information directly from the encoder, e.g. when the
bitstream was pre-encoded, and also there is no timestamp
allocated for each NAL unit, then the sender implementation
can inspect subsequent NAL units in decoding order to
determine whether or not the NAL unit is the last NAL unit
of an access unit as follows. A NAL unit naluX is the last
NAL unit of an access unit if it is the last NAL unit of
the bitstream or the next VCL NAL unit naluY in decoding
order has the high-order bit of the first byte after its
NAL unit header equal to 1, and all NAL units between naluX
and naluY, when present, have nal_unit_type in the range of
32 to 35, inclusive, equal to 39, or in the ranges of 41 to
44, inclusive, or 48 to 55, inclusive.
Payload type (PT): 7 bits
The assignment of an RTP payload type for this new packet
format is outside the scope of this document and will not be
specified here. The assignment of a payload type has to be
performed either through the profile used or in a dynamic way.
Informative note: It is not required to use different
payload type values for different RTP streams in MSM.
Sequence number (SN): 16 bits

Timestamp: 32 bits
The RTP timestamp is set to the sampling timestamp of the
content. A 90 kHz clock rate MUST be used.
If the NAL unit has no timing properties of its own (e.g.
parameter set and SEI NAL units), the RTP timestamp MUST be
set to the RTP timestamp of the coded picture of the access
unit in which the NAL unit (according to Section 7.4.2.4.4 of
[HEVC]) is included.
Receivers MUST use the RTP timestamp for the display process,
even when the bitstream contains picture timing SEI messages
or decoding unit information SEI messages as specified in
[HEVC]. However, this does not mean that picture timing SEI
messages in the bitstream should be discarded, as picture
timing SEI messages may contain frame-field information that
is important in appropriately rendering interlaced video.
Synchronization source (SSRC): 32-bits
Used to identify the source of the RTP packets. In SSM, by
definition a single SSRC is used for all parts of a single
bitstream. In MSM, each SSRC is used for an RTP stream
containing a subset of the sub-layers for a single (temporally
scalable) bitstream. A receiver is required to correctly
associate the set of SSRCs that are included parts of the same
bitstream.
Informative note: The term "bitstream" in this document is
equivalent to the term "encoded stream" in [I-D.ietf-
avtext-rtp-grouping-taxonomy].

Single NAL Unit Packets

A single NAL unit packet contains exactly one NAL unit, and
consists of a payload header (denoted as PayloadHdr), a
conditional 16-bit DONL field (in network byte order), and the
NAL unit payload data (the NAL unit excluding its NAL unit
header) of the contained NAL unit, as shown in Figure

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr | DONL (conditional) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| NAL unit payload data |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3 The structure a single NAL unit packet
The payload header SHOULD be an exact copy of the NAL unit header
of the contained NAL unit. However, the Type (i.e.
nal_unit_type) field MAY be changed, e.g. when it is desirable to
handle a CRA picture to be a BLA picture [JCTVC-J0107].
The DONL field, when present, specifies the value of the 16 least
significant bits of the decoding order number of the contained
NAL unit. If tx-mode is equal to "MSM" or sprop-max-don-diff is
greater than 0, the DONL field MUST be present, and the variable
DON for the contained NAL unit is derived as equal to the value
of the DONL field. Otherwise (tx-mode is equal to "SSM" and
sprop-max-don-diff is equal to 0), the DONL field MUST NOT be
present.

Aggregation packets (APs) are introduced to enable the reduction
of packetization overhead for small NAL units, such as most of
the non-VCL NAL units, which are often only a few octets in size.
An AP aggregates NAL units within one access unit. Each NAL unit
to be carried in an AP is encapsulated in an aggregation unit.
NAL units aggregated in one AP are in NAL unit decoding order.
An AP consists of a payload header (denoted as PayloadHdr)
followed by two or more aggregation units, as shown in Figure

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| PayloadHdr (Type=48) | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |
| two or more aggregation units |
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure The structure of an aggregation packet
The fields in the payload header are set as follows. The F bit
MUST be equal to 0 if the F bit of each aggregated NAL unit is
equal to zero; otherwise, it MUST be equal to 1. The Type field
MUST be equal to 48. The value of LayerId MUST be equal to the
lowest value of LayerId of all the aggregated NAL units. The
value of TID MUST be the lowest value of TID of all the
aggregated NAL units.
Informative Note: All VCL NAL units in an AP have the same TID
value since they belong to the same access unit. However, an
AP may contain non-VCL NAL units for which the TID value in
the NAL unit header may be different than the TID value of the
VCL NAL units in the same AP.
An AP MUST carry at least two aggregation units and can carry as
many aggregation units as necessary; however, the total amount of
data in an AP obviously MUST fit into an IP packet, and the size
SHOULD be chosen so that the resulting IP packet is smaller than
the MTU size so to avoid IP layer fragmentation. An AP MUST NOT
contain Fragmentation Units (FUs) specified in section 4.8. APs
MUST NOT be nested; i.e. an AP MUST NOT contain another AP.
The first aggregation unit in an AP consists of a conditional 16-
bit DONL field (in network byte order) followed by a 16-bit
unsigned size information (in network byte order) that indicates
the size of the NAL unit in bytes.

Fragmentation Units (FUs)

Fragmentation units (FUs) are introduced to enable fragmenting a
single NAL unit into multiple RTP packets, possibly without
cooperation or knowledge of the HEVC encoder. A fragment of a NAL
unit consists of an integer number of consecutive octets of that
NAL unit. Fragments of the same NAL unit MUST be sent in consecutive
order with ascending RTP sequence numbers (with no other RTP packets
within the same RTP stream being sent between the first and last
fragment).
When a NAL unit is fragmented and conveyed within FUs, it is
referred to as a fragmented NAL unit. APs MUST NOT be
fragmented. FUs MUST NOT be nested; i.e. an FU MUST NOT contain
a subset of another FU.