Subversion checkout URL

RTC_API_Proposal

Pages 4

Clone this wiki locally

RTC API Proposal

This proposal served as input for the formal specification of RTC APIs, which is currently being standardized by the W3C RTC WG. You can find the current version of the spec here. The API presented on the document is preserved for archival purposes.

The WhatWG proposal for real time media streams presents many fine ideas, as does an extension to the Streams API presented in that document (as proposed to the W3C audio working group). This proposal builds on those two documents to present an API for media capture and transmission in web browsers.

The primary motivations for this document are:

Some use-cases are not satisfied with either of the earlier proposals.

Some aspects of earlier proposals are amenable to simplification, and others may present unique implementation challenges, which this proposal takes into account.

Firefox already supports a rich Audio API for manipulating streams and we would like to ensure that subsequent work on video and real time communication plays well with other media APIs.

Use cases

For purposes of designing this API, we present the following use-cases. We omit the use-cases that do not pertain to the RTC working group (such as local-only media capture or audio processing), but it suffices to state that from an implementation perspective, it is important to consider all media-related APIs for coherence, and that the API proposed in this document does take those use-cases into account, even though they are not presented here.

Simple video and voice calling site

Broadcasting real time video & audio streams

Browser based MMORPG that enables player voice communication (push to talk)

Video conferencing between 3 or more users (potentially on different sites)

[Fill in more use cases from IETF document]

API Specification

The API proposed in this section is intended to be the baseline that should be provided by the browser and to give web applications the maximum amount of flexibility. Some use-cases (such as a simple video chat application) may be fulfilled by a simpler API more intuitive to web developers; however, it is hoped that such an API may be built on top of the proposed baseline. We do not preclude that a simpler API be specified by the working group, but suggest that it be mandatory for browsers to implement the following specification to ensure that all targeted use-cases are satisfied.

We split the specification into three distinct portions for clarity: definition of media streams, obtaining device access, and establishing peer connections. Implementation of all three sections are required for an end-to-end solution satisfying all the use-cases targeted.

Media streams

A media stream is an abstraction over a particular window of time-coded audio or video data (or both). The time-codes are in the stream's own internal timeline. The internal timeline can have any base offset but always advances at the same rate as real time. Media streams are not seekable in any direction.

When the readyState of a media stream is LIVE, the window is advancing in real-time. When the state is BLOCKED, the stream does not advance (the user-agent may replace this with silence until the stream is LIVE again); and ENDED implies no further data will be received on the stream. MediaStreams have implicit 'sources' and 'sinks'. Whenever you receive a MediaStream from the user-agent or a PeerConnection, the source is setup by either of them; and assigning the stream to a HTML element for rendering sets them up as sinks.

JSONHints provides hints from the application as to what kind of output it needs from the MediaStreamRecorder, and is described later in the proposal.

A MediaStreamTrack can contain either audio, video or data. The origin of the media stream is free to choose how media data is split between multiple tracks. Typically, there will be separate tracks for audio, video, subtitles, DTMF etc. Tracks can be individually ENABLED or DISABLED. If all tracks in a stream are DISABLED it simply means the streams is playing nothing.

The programmer can also provide "hints" to the MediaStreamTrack as to the kind of data it is carrying. The MediaStreamTrack's type may then change to accommodate the provided hints, and if this is done, the onTypeChanged event handler for the track will be called. When this happens, a new track will be created with the new codec type and a reference to this new track will be provided in the only argument to the onTypeChanged callback.

Streams can be associated with existing HTML media elements such as <video> and <audio>, and video streams with <canvas>. Each of these tags may serve as either the input or output for a media stream, by setting or getting the stream attribute as appropriate.

The caller may set the values in the options JS object depending on its requirements. Note that these are merely suggestions from the caller, the returned stream may not match the requirement exactly (though the user-agent will make its best effort to provide the requested stream). If either of the requested inputs (audio / video) are not available, the success callback must still be called; thus the application must check the type attribute of the resulting tracks in the stream handed to it to verify whether the stream contains only audio, only video, or both. If hardware to fulfill the request is unavailable the error callback is invoked with RESOURCE_UNAVAILABLE, but if hardware is available and is currently being used by another application, RESOURCE_BUSY is returned. Additionally, the user-agent may choose to offer the user to select a local file to act as the source of the media stream in place of real hardware.

Peer connections

A peer connection provides a UDP channel of communication between two user-agents.

The configuration string gives the address of a STUN or TURN server used to establish the connection. sendSignal is a function that is provided by the caller which will be called when the user-agent needs to transport and out of band signalling message to the remote peer. When a message is received from the remote peer via this channel, it must be sent to the user-agent by calling receivedSignal(). The ordering of messages is important.

When a PeerConnection object is created, the readyState is set to NEW. Peers that are willing to receive incoming connections may call listen() to indicate this, and their readyState changes to LISTENING. Peer willing to initiate a connection to another peer may call open() to begin this process (their readyState changes to OPENING). The listening end will receive a callback on the onIncoming function, in which they may decide to accept() the connection. If a connection is accepted, the readyState changes to ACTIVE and the peer may start transmitting media packets. The readyState on the far end changes to ACTIVE as soon as the first packet from the initiating end is received.

Using a single PeerConnection to handle multiple incoming connections presents some unique challenges, but also has the desirable property of being able to stream out a single set of MediaStreams to multiple peers (which can also be changed mid-session). However, it remains to be specified how the addresses passed to the open() and accept() calls are obtained by the JS caller.

An alternative programming model is similar to that of UNIX sockets, we can define a separate PeerListener object that will in turn create new PeerConnection objects for every accepted incoming connection. This scheme has the advantage of associating every PeerConnection with only 2 endpoints (signaling & addressing is very clear in this case), however makes it harder to use a single set of MediaStreams for multiple clients since they will have to be setup for every individual connection. The alternative proposal is at RTC_API_Proposal:PeerListener.

Examples

Simple Video Call

Simple video calling between two users A and B. A is making a call to B: