DASH Live Streaming with Azure Media Service

This article focuses on the live streaming DASH features enabled by Azure Media Service, and how they can be used to deliver live and video on demand adaptive streaming to Web browsers and new devices of all types, which are adding support for the DASH standard. DASH live streaming is now available for public preview, and will graduate to “general availability” with normal service level agreements after the preview period. DASH output is a runtime option for all live and VOD streaming from Azure Media Services. A player can request a DASH Media Presentation Description manifest and compatible ISO Base Media File Format “Media Segments” just by including a DASH format tag in each URL request. The same files or live stream can be delivered in Microsoft Smooth Streaming, Apple HLS, or Adobe HDS by indicating any of those formats in the URL format tag. This enables the introduction of DASH to new browsers and devices while maintaining compatibility with legacy players and formats. The ability to dynamically package media segments in realtime is essential for low latency live streaming, as well as efficient multiplatform support. For information on setting up an Azure Media account with a live channel, live encoder, and streaming origin server, see Jason Suess’s excellent blog, “Getting Started with Live Streaming Using the Azure Media Management Portal”. Azure Media origin servers can support millions of simultaneous streams using Content Delivery Networks (CDN), including the Azure CDN.

What is DASH Streaming?

DASH is an acronym for “Dynamic Streaming over HTTP”. DASH is an adaptive streaming protocol specified in the international standard, ISO/IEC 29009-1:2014 “Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats”. DASH was standardized in MPEG, the Moving Picture Experts Group (ISO/IEC Study Group 29/Working Group 11). It includes an XML schema for a Media Presentation Description manifest (MPD), and general options for describing media, including some specific profiles for MPEG-2 Transport Streams, and MPEG-4 ISO Base Media files. The ISO Media profiles are similar to Microsoft Smooth Streaming and Adobe HDS. Because of the similarity, Smooth, Flash, and PrimeTime players have been able to add support support for the ISO Media profiles of DASH without major modification. Adaptive streaming has enabled the explosive growth of video over the Internet (now the majority of Internet data) because it allows a huge variety of video devices to each select a compatible audio and video encoding from multiple options, and dynamically select a higher or lower bitrate in response to network conditions in order to delivery continuous video at the highest quality the network and device can support. HTTP adaptive streaming takes advantage of web servers and Content Delivery Networks (CDNs) to support millions of simultaneous streams from the same cached Media Segments without additional server and backbone load. The DASH standard was a collaborative effort that included every major video specification organization and industry sector participating to develop a single format for internet video delivery … similar to other Web standards such as HTTP, PNG, Unicode, MP3, etc. that have made web pages with non-video media types globally interoperable. Netflix, YouTube, and other large video providers are transitioning to DASH with ISO Media, all major browsers support the W3C APIs for DASH playback, and new mobile devices, connected TVs, and settop boxes support DASH playback. DASH with ISO Media has been adopted by broadcasters and equipment manufactures for delivering TV content over the Internet. Browser plugins, such as Silverlight and Flash, have been deprecated for security and complexity reasons, so native support for scripted DASH playback in HTML5 browsers is necessary to replace proprietary streaming formats. The goal for publishers and broadcasters is to create a single HTML5 web page that will stream DASH and their interactive Web experience to all devices with HTML5 browsers. Developing and supporting separate applications for each type of device costs money, reach, and time to market, so a single Web page will likely replace multiple apps when new HTML5 browsers reach sufficient market share.

Does DASH Streaming “Just Work”?

No. DASH is basically a “tool kit” that describes a presentation but does not specify media formats, decoding, adaptive switching behavior, player implementation, or interoperability. Organizations such as DASH Industry Forum, 3GPP, DVB, EBU, DECE, HbbTV, DLNA, etc. have specified DASH “profiles” derived from the “ISO Media Live Profile” and “ISO Media On Demand Profile” included in the DASH standard. The DASH specified ISO Media Profiles were intended to enable consistent derivation by other organizations that have specific application scenarios and expertise, similar to what has been done in the past with digital TV, DVD, etc. deriving application specifications from generic MPEG standards. The Profiles in the DASH standard specify in general how an MPD should describe ISO Media Segments that contain ISO Media movie fragments in order to download and synchronize audio and video movie fragments on a common presentation timeline. DASH doesn’t specify what Segments will be seamlessly switchable, a basic requirement of adaptive bitrate streaming, or whether they’ll even be decodable. That depends on parser and decoder capabilities that the DASH standard does not specify, so application specifications are needed for encode/decode interoperability. Specifications like DASH-IF Implementation Guidelines define movie fragment packaging constraints important for splicing Segments from different Representations, codecs, encoding parameters, encryption parameters, Adaptation Set constraints to make them seamlessly switchable, and specific DASH tools and their use in order to create well-defined “interoperability points” that can be supported and identified by DASH content and players for reliable playback. Azure Media Service generates DASH MPDs and Media Segments consistent with DASH-IF guidelines. Azure Media's specific subset of features provides compatibility with the widest range of delivery scenarios and players, and supports important uses cases such as targeted advertising and DRM content protection in highly scalable and available systems (i.e. tested at scale by delivering the Olympics and World Cup to several continents from multiple data centers with 100% redundancy and availability).

Creating a DASH Live MPD

Its Automatic! A DASH MPD similar to the one below is created automatically based on the audio and video streams delivered to an Azure Media Channel (an ingest URL) using an uplink protocol such as RTMP, or Smooth Streaming HTTP:(Post) to push MP4 streams or multibitrate encoded movie fragments that are packaged as CSF movie fragments when Media Segments are requested after a Program from a Channel is assigned a Streaming Locator (origin server). For the details, see “Getting Started with Live Streaming Using the Azure Media Management Portal”. The duration of AVC Coded Video Sequences should be about 2 seconds, since that determines video Segment duration, and 2 second Segments are optimal for rapid bitrate adaptation balanced against video codec efficiency. Two second Segments also allow relatively low latency between the arrival of a video frame at the live encoder, and its presentation on a player. Actual latency depends on the buffering logic of each player and the reliability of its network connection. Latency is typically tuned to anywhere from a few seconds to thirty seconds. Because Azure Media Service uses “realtime” Segment addressing and an MPD Segment Timeline, players know with certainty the last available Media Segment so they can join the “live edge” of a realtime stream without the risk of requesting Segments that are not yet available. Requesting not yet available Segments results in a cascade of HTTP 404 (not found) errors between Content Delivery Network (CDNs) and origin servers, so can clog a network like a denial of service attack if repeated by thousands of players. DASH addressing schemes that rely on the synchronization of player clocks to the origin server clock must “back off” by an extra safety margin that exceeds the worst case clock error, but Segment Timeline avoids extra latency. At the start of the Preview period, simple track combinations of AAC audio and AVC video are supported, but multiple tracks and additional codecs will be validated before General Availability is announced. CEA-608/708 captions may be embedded in SEI messages in the AVC stream for closed caption delivery to players. Single Period MPDs are supported to enable equivalent simultaneous presentations in DASH, Smooth, HLS, and HDS. A consistent event message format allows delivery of program insertion messages already present in broadcast video (e.g. SCTE-35, VMAP) to all platforms so that players can perform ad insertion, etc. without relying on multiple Periods, which is a DASH-specific solution. The ingested media and metadata is used to construct an MPD whenever a manifest with a DASH format tag (format=mpd-time-csf) is requested with a streaming locator such as this:

MPD only needs to be downloaded on startup, and when the player is notified by an Event Message Box (‘emsg’) in a Media Segment. An update period of zero (MPD@minimumUpdatePeriod=”PT0S”) and an <InbandEventStream> element indicate the server will send an MPD update event when necessary (such as at the end of a live presentation). Simple players have the option to ignore update events in the media stream, and instead download an MPD update with a frequency approximately equal to the Segment duration (e.g. every 2 seconds), but this results in lower network efficiency and additional latency.

Each MPD version is identified by MPD@publishTime, which can be compared to Event Message MPD expire time to determine if an MPD update is necessary.

Server is maintaining a PVR time shift buffer depth of 4 minutes behind the live edge in this example (default is an infinite PVR buffer, where all the recorded video is random accessible).

MPD@availabilityStartTime is UTC time when the live presentation started, and also zero MPD Presentation Time for synchronization of all media tracks/Adaptation Sets.

AdaptationSet@bitstreamSwitching=”true” so only one Initialization Segment per Adaptation Set needs to be processed on startup. Re-initialization is not required on bitrate switches, and any continuous sequence of Segments forms a valid ISO Media CSF (Common Streaming Format) file. See DECE DMEDIA “Common File Format & Media Formats Specification” . This results in the best performance with existing decoders, and the most seamless bitrate switching across a wide variety of players, some of which may glitch when their media pipeline and decoders re-initialized.

<SegmentTemplate> allows player to calculate the next Segment address without downloading a new playlist to get the next URL. URLs are formed by resolving the $Time$ substitution parameter to the media presentation time of the start of each Segment. The media presentation time is stored in the Track Fragment Decode Time Box (‘tfdt’) in every CSF Media Segment, which is a timestamp expressed as an integer in the ISO Media track timescale, and AdaptationSet@timescale of the ISO Media track XML Representation. The format delivered by Azure Media Service is determined by the URL format tag, e.g. “./Fragments(video=$Time$,format=mpd-time-csf)". In this case, indicating an MPD and Media Segments using time addressing, and CSF Media Segments. URLs are document relative, so the same MPD can be located at different root URLs (stream locator URLs) with editing.

<SegmentTimeline> maintains the Adaptation Set’s media timeline, with the start time and duration of each Segment in an Adaptation Set listed in order to provide exact timestamps with variable duration Segments caused by splices, scene detection, dropped Segments, and other live encoding realities. An exact Segment Timeline also enables synchronization of Segment timestamps and URLs between independent encoders and servers operating from the same live feed, and time based synchronization of events, such as ad insertion. The @t attribute identifies the media timestamp of the first Segment in the Period, which is coincident with UTC @availabilityStartTime, in this case with a single Period starting at zero presentation time. The duration, timestamps, and timescale of live encoded streams need not be altered for different presentations (e.g. live to VOD) because Segment Timeline supports flexible media timebases and start points. For accurate audio/video synchronization, video must use negative composition offsets to make the first sample presentation time equal the first sample decode time in each Segment (the first sample decode time is stored in each Segment’s ‘tfdt’ box). The video SegmentTimeline in this example is very concise because movie fragment durations match exactly, indicating 120, two-second Segments (@r=119 indicates 119 repeats of the same exact duration). That fills the rolling 4 minute time shift buffer, and an MPD need not list more than the available Segments when it was published.

This example has 8 different Representations in a video Adaptation Set, and each is encoded with a bitrate and spatial subsampling that is a fraction of the source bitrate and image size. Precise coding of AVC encoding and cropping parameters enables accurate rescaling and pixel registration so that relatively small bitrate switches are not noticed, and subsampling in proportion to bitrate reduction helps maintain perceived visual quality by avoiding coding artifacts, other than “softening” of the image as the bitrate is reduced.

The audio Adaptation Set has a single Representation, and an unusual pattern of slightly different Segment element <S> @d durations resulting from the 10MHz timebase not being an even multiple of the audio 44.1kHz track timebase, sync frame duration, and Segment duration.

Channel 2

More Microsoft DASH player options are described in Cenk Dingiloglu's Blogs, starting with “Announcing: Microsoft Smooth Streaming Client 2.5 with MPEG DASH support”. His other blogs describe DASH playback on Flash using an OSMF plugin, DASH playback on Windows Phone, etc. The Microsoft PlayReady team also provides development kits for DASH plus PlayReady apps on Android and iOS systems. Azure Media Services announced General Availability of DASH VOD services at NAB 2014 and many DASH players from Microsoft and other sources support it, but players needs to be tested with the Azure Media Live DASH streams now available to determine their reliability for live streaming.