RTCStreamAPI

This doc specifies an API, and the semantics thereof, for manipulating point-to-point connections that use ICE / STUN / TURN to set up audio and video flows using RTP through a Javascript interface that can be implemented in browsers.

It is expressed using WebIDL syntax, which should suffice to completely specify a Javascript API. Other specifications can specify concrete APIs that expose this functionality in other languages.

OverviewThe goal of this API is to provide a high-level interface to allow developers to create media sessions, such as used in phone calls, games or multiparty video conferences.

Goals● Support for voice and video sessions.● Support for data transfer.● Support for handling multiple voice and video streams, possibly multiplexing them onto a lower number of transports● Support for encryption of media.● Safe for untrusted content, with user permission.● High performance; suitable for high-throughput applications such as HD video conferencing● Does not depend on any specific signaling service or protocol.● Able to exchange media with existing RTP/(S)AVP(F) endpoints that are ICE-aware.

Non-Goals● Complete interoperation with legacy (non-ICE-aware) endpoints.● Direct support for SIP or XMPP

History and previous namesThis interface was at various times known as ConnectionPeer, being evolved from the ConnectionPeer of the HTML5 proposal (as of March 2011 named PeerConnection).The current name, RtcSession, is chosen to reflect that this is intended for real-time connections at a higher level than a transport (“byte transmission”) interface.

The RtcSession interfaceThis specification replaces the one in the HTML5 specification as of Feb 14, 2011.

A RtcSession is a higher level object that encompasses all information about the media and data flow between two peers, and knows how to use the signalling path to handle changes in the devices.It may encompass a number of RTP sessions. If mapped into SDP, its state at any given time can be represented by a single SDP session.

It is an interface to underlying subsystems of the browser; it makes no sense to think of it as an object with independent existence. In particular, the data flows that flow through a RtcSession, and are configured via the Stream objects described below, are not directly accessible to JavaScript; they are internal functions inside the browser.

It is created via a factory object, and initialized via a formatted string.

To illustrate the use of the API, this HTML creates a very simple videophone initiator:

The API for a RtcSession, the object that defines a single group of media connections going to one other participant, iis defined below. It uses the concepts of “StreamSource” and “StreamSink” that are described further on in this document.

// Connection negotiation.// The handler will// be called when a RtcSession reuests an outgoing// connection. Details TBD.attribute Function onOutgoingNegotiationItem; void IncomingNegotiationItem(DOMString item); // Functions for querying status are TBD. // More functions for detailed manipulation can be added /// at-will.}

StreamsA Stream is an interface that controls data flows. It supports:● Identification● Connecting a source (StreamSource object) to one or more sinks (StreamSink object).

The id is guaranteed to be unique across all items on the same class in an app context (there is no guarantee that it’s globally unuque, or that all StreamSources have ids distinct from all StreamSinks).

The usual usage of a stream is that one creates streams using factory functions of other objects. Some of these return objects that implement the StreamSink interface, some return objects that implement the StreamSource interface, some may return objects that implement both.

A StreamSource and StreamSink can be connected:

ChannelA.add_data_recipient(ChannelB)

Conceptually, one can imagine that this connection is implemented in terms of an “onmessage” event handler in ChannelA that calls a function in ChannelB to pass the data:

ChannelA.onmessage = function(event) { ChannelB.send(event.data)}

but in practical implementation, the data will not be accessible to outside inspection.

It is possible to connect multiple StreamSources to one StreamSink (mixing). If the sink does not support the requested mixing, the add_data_recipient call will throw an <IncompatibleDestination> exception.

It is possible to connect a StreamSource to multiple StreamSinks (split, for instance used in self-view provisioning).

When a StreamSink is connected to a StreamSource, the add_data_recipient call may thrown an <IncompatibleDestination> exception if the types are incompatible (codec mismatch, or connecting a video stream to an audio-only device).

interface StreamSource { void Mute(); void Unmute(); // Define where data will be sent. void add_data_recipient(StreamSink recipient); readonly attribute DOMString id; // Where data are currently being sent. // It is not certain that these need to be exposed. attribute StreamSink[] data_recipients; // Callback when media stops being available attribute Function onerror;}

There are multiple possible destinations and sources - for instance, one may desire to display video using a <video> tag, a <canvas> tag or a WebGL interface. Rather than adapting to each specific form, we define a constructor for StreamSource and StreamSink that gives a StreamSource or a StreamSink for the DOM object that is going to be used.

The protocol used for negotiation is mediated through the onconnectionrequest handler.If the onconnectionrequest handler is instantiated, it is OK to omit the connectionmediator from the initialization string; if neither is given, “connect()” will throw an IllegalConfiguration exception.

The STUN server may either be an IP address:port literal, or be a domain name. If it is a domain name, the procedure in section 9 of RFC 5389 (SRV record lookup, with fallback to port 3478 (STUN) or 5349 (STUN over TLS)) is used to establish the IP address and port to use for STUN and TURN.If “service” and “protocol” are omitted, they are assumed to be “stun” and “udp” for stun_service, and “turn” and “udp” for turn_service.For TURN, the procedure is defined in RFC 5766 section 6.1. The procedure of RFC 5928 (using S-NAPTR applications) is not used.

Media configuration stringThe media configuration string gives the type of media and any parameters required to refine it further. It is used as part of the input to construct a media negotiation string.

“type” MUST be present. All other attributes are optional.If “label” is present, it MUST be unique within the set of streams of this class. If it is absent, the implementation will generate an unique string.The “type” is one of “video” or “audio”. The implementation may support other types. (TODO: Add “data” once there is an agreed proposal for how to transport data)The “label” attribute conforms to the syntax of RFC 4574 section 4 “Label” attributes (ASCII with some syntax-sensitive characters disallowed).

For video, the attribute “size” gives the width x height in pixels of the largest display area that makes sense to the caller; it is used by the video engine to select a suitable video stream resolution, but it gives no guarantee that the resulting video stream will have exactly that resolution.

Media negotiation stringThe media negotiation string is passed across the negotiation interface. It contains the information required to negotiate media.

While the SDP format is universally understood to have multiple flaws that mean we should not emulate or require it, it is also relatively common to use SDP in an offer/answer mode to communicate the information needed for setup - which means that it is at least able to represent the information needed. The fields below are picked to make it obvious how they are mapped to SDP fields; there is no assumption that all SDP fields make sense in this format.

Connection establishment event flowTo initiate a call, an application will create an RtcSession object and initialize it to know what sources and sinks to request, and then call the “connect” method. The object will then do internal processing to emit one or more calls to the “onOutgoingNegotiatonItem” callback; the media negotiation string will be sufficient to construct (if required) an SDP “offer” for use in a SIP exchange.

The responding peer, if it is of the same type, will construct an RtcSession object and call its IncomingNegotiationItem() function with the passed information. If the negotiation ins successful, it will call its onOutgoingNegotiationItem callback, which is assumed to pass the information to the initiating peer.

The initiating peer’s application will then call its IncomingNegotiationItem function; if the answer is acceptable to the initiator, “onconnect” is signalled.

The session description strings sent to a RtcSession need to contain all the information needed to successfully negotiate a multimedia connection.

Appendix: ICE info in session control protocolsThe session description strings need to contain all the information needed to successfully set up a bidirectional datagram transport. This section reproduces readily available examples of how this information is represented in SDP and XMPP.This is included to make sure the expressive power of the ICE info in media negotiation strings is sufficient.