This document defines a set of JavaScript APIs that allow local media,
including audio and video, to be requested from a platform.

This document is not complete. It is subject to major changes and, while
early experimentations are encouraged, it is therefore not intended for
implementation. The API is based on preliminary work done in the
WHATWG.

Introduction

This document defines APIs for requesting access to local multimedia
devices, such as microphones or video cameras.

This document also defines the MediaStream API, which provides the means
to control where multimedia stream data is consumed, and provides some
control over the devices that produce the media. It also exposes
information about devices able to capture and render media.

This specification defines conformance criteria that apply to a single
product: the User Agent that implements the interfaces that it
contains.

Conformance requirements phrased as algorithms or specific steps may be
implemented in any manner, so long as the end result is equivalent. (In
particular, the algorithms defined in this specification are intended to be
easy to follow, and not intended to be performant.)

Implementations that use ECMAScript [[ECMA-262]] to implement the APIs
defined in this specification must implement them in a manner consistent
with the ECMAScript Bindings defined in the Web IDL specification
[[!WEBIDL-1]], as this specification uses that specification and
terminology.

Terminology

HTML Terms:

The EventHandler
interface represents a callback used for event handlers as defined in
[[!HTML52]].

The concepts queue
a task and fires a simple event are
defined in [[!HTML52]].

The terms event handlers
and responsible
document are defined in [[!HTML52]].

The term current settings
object is defined in [[!HTML52]].

The term allowed to
use is defined in [[!HTML52]].

The terms fulfilled,
rejected and
resolved used in the context of Promises
are defined in [[!ES6]].

A source is the "thing" providing the source of a media stream
track. The source is the broadcaster of the media itself. A source can
be a physical webcam, microphone, local video or audio file from the
user's hard drive, network resource, or static image. Note that this
document describes the use of microphone and camera type sources only,
the use of other source types is described in other documents.

An application that has no prior authorization regarding sources is
only given the number of available sources, their type and any
relationship to other devices. Additional information about sources can
become available when applications are authorized to use a source (see
).

Sources do not have constraints — tracks have
constraints. When a source is connected to a track, it must produce
media that conforms to the constraints present on that track, to that
track. Multiple tracks can be attached to the same source. User Agent
processing, such as downsampling, MAY be used to ensure that all tracks
have appropriate media.

Sources have constrainable properties which have
capabilities and settings
exposed on tracks. While the constrainable properties are "owned" by the
source, sources MAY be able to accomodate different demands at once.
For this reason, capabilities are common to any (multiple) tracks that
happen to be using the same source, whereas settings MAY differ per
track (e.g., if two different track objects bound to the same source
query capability and settings information, they will get back the same
capabilities, but may get different settings that are tailored to
satisfy their individual constraints).

Setting (Source Setting)

A setting refers to the immediate, current
value of the source's constrainable properties. Settings are always
read-only.

A source's settings can change dynamically over time due to
environmental conditions, sink configurations, or constraint changes. A
source's settings must always conform to the current set of basic
(mandatory) constraints on all attached tracks. A source that cannot
conform to these constraints causes affected tracks to become
overconstrained and therefore muted.
A User Agent attempts to ensure that sources adhere to advanced
(optional) constraints as closely as possible, see .

Although settings are a property of the source, they are only
exposed to the application through the tracks attached to the source.
This is exposed via the ConstrainablePattern interface.

Capabilities

For each constrainable property, there is a capability that
describes whether it is supported by the source and if so, the range of
supported values. As with settings, capabilities are exposed to the
application via the ConstrainablePattern interface.

The values of the supported capabilities must be normalized to the
ranges and enumerated types defined in this specification.

A getCapabilities() call on
a track returns the same underlying per-source capabilities for all
tracks connected to the source.

Source capabilities are effectively constant. Applications should be
able to depend on a specific source having the same capabilities for
any browsing session.

This API is intentionally simplified. Capabilities are not capable
of describing interactions between different values. For instance, it
is not possible to accurately describe the capabilities of a camera
that can produce a high resolution video stream at a low frame rate and
lower resolutions at a higher frame rate. Capabilities describe the
complete range of each value. Interactions between constraints are
exposed by attempting to apply constraints.

Constraints

Constraints provide a general control surface that allows
applications to both select an appropriate source for a track and, once
selected, to influence how a source operates.

Constraints limit the range of operating modes that a source can use
when providing media for a track. Without provided track constraints,
implementations are free to select a source's settings from the full
ranges of its supported capabilities. Implementations may also adjust
source settings at any time within the bounds imposed by all applied
constraints.

getUserMedia() uses constraints
to help select an appropriate source for a track and configure it.
Additionally, the ConstrainablePattern interface on tracks
includes an API for dynamically changing the track's constraints at any
later time.

A track will not be connected to a source using getUserMedia() if its initial constraints cannot be
satisfied. However, the ability to meet the constraints on a track can
change over time, and constraints can be changed. If circumstances
change such that constraints cannot be met, the
ConstrainablePattern interface defines an appropriate error to
inform the application. explains how
constraints interact in more detail.

In general, User Agents will have more flexibility to optimize the
media streaming experience the fewer constraints are applied, so
application authors are strongly encouraged to use mandatory
constraints sparingly.

For each constrainable property, a constraint exists whose name
corresponds with the relevant source setting name and capability
name.

RTCPeerConnection

RTCPeerConnection is defined in
[[WEBRTC10]].

Permissions

The terms permission, retrieve the permission
state, request permission to
use and create a permission storage
entry are defined in [[!permissions]].

MediaStream API

Introduction

The two main components in the MediaStream API are the
MediaStreamTrack and MediaStream interfaces. The
MediaStreamTrack object represents media of a single
type that originates from one media source in the User Agent, e.g. video
produced by a web camera. A MediaStream is used to
group several MediaStreamTrack objects into one unit
that can be recorded or rendered in a media element.

Each MediaStream can contain zero or more
MediaStreamTrack objects. All tracks in a
MediaStream are intended to be synchronized when
rendered. This is not a hard requirement, since it might not be possible
to synchronize tracks from sources that have different clocks. Different
MediaStream objects do not need to be
synchronized.

While the intent is to synchronize tracks, it could be
better in some circumstances to permit tracks to lose synchronization. In
particular, when tracks are remotely sourced and real-time [[WEBRTC10]],
it can be better to allow loss of synchronization than to accumulate
delays or risk glitches and other artifacts. Implementations are expected
to understand the implications of choices regarding synchronization of
playback and the effect that these have on user perception.

A single MediaStreamTrack can represent
multi-channel content, such as stereo or 5.1 audio or stereoscopic video,
where the channels have a well defined relationship to each other.
Information about channels might be exposed through other APIs, such as
[[WEBAUDIO]], but this specification provides no direct access to
channels.

A MediaStream object has an input and an output
that represent the combined input and output of all the object's tracks.
The output of the MediaStream controls how the object
is rendered, e.g., what is saved if the object is recorded to a file or
what is displayed if the object is used in a video element.
A single MediaStream object can be attached to
multiple different outputs at the same time.

A new MediaStream object can be created from
existing media streams or tracks using the
MediaStream() constructor. The constructor argument
can either be an existing MediaStream object, in
which case all the tracks of the given stream are added to the new
MediaStream object, or an array of
MediaStreamTrack objects. The latter form makes it
possible to compose a stream from different source streams.

Both MediaStream and
MediaStreamTrack objects can be cloned. A cloned
MediaStream contains clones of all member tracks from
the original stream. A cloned MediaStreamTrack has a
set of constraints that is
independent of the instance it is cloned from, which allows media from
the same source to have different constraints applied for different
consumers. The MediaStream object is also used in
contexts outside getUserMedia, such as [[WEBRTC10]].

MediaStream

The MediaStream() constructor composes a new
stream out of existing tracks. It takes an optional argument of type
MediaStream or an array of
MediaStreamTrack objects. When the constructor is invoked, the User
Agent must run the following steps:

The tracks of a MediaStream are stored in a
track set. The track set MUST contain the
MediaStreamTrack objects that correspond to the
tracks of the stream. The relative order of the tracks in the set is User
Agent defined and the API will never put any requirements on the order.
The proper way to find a specific MediaStreamTrack
object in the set is to look it up by its id.

An object that reads data from the output of a
MediaStream is referred to as a
MediaStreamconsumer. The list of
MediaStream consumers currently include media
elements (such as <video> and
<audio>) [[HTML52]], Web Real-Time Communications
(WebRTC; RTCPeerConnection) [[WEBRTC10]], media recording
(MediaRecorder) [[mediastream-recording]], image capture
(ImageCapture) [[image-capture]], and web audio
(MediaStreamAudioSourceNode) [[WEBAUDIO]].

MediaStream consumers must be able to
handle tracks being added and removed. This behavior is specified per
consumer.

The User Agent may update a MediaStream's track set in response to, for example, an external
event. This specification does not specify any such cases, but other
specifications using the MediaStream API may. One such example is the
WebRTC 1.0 [[WEBRTC10]] specification where the track set of a MediaStream, received
from another peer, can be updated as a result of changes to the media
session.

To add a tracktrack to a
MediaStreamstream, the User Agent MUST run
the following steps:

Attributes

When a MediaStream is created, the User
Agent MUST generate an identifier string, and MUST initialize the
object's id
attribute to that string, unless the object is created as part of
a special purpose algorithm that specifies how the stream id must
be initialized. A good practice is to use a UUID [[rfc4122]],
which is 36 characters long in its canonical form. To avoid
fingerprinting, implementations SHOULD use the forms in section
4.4 or 4.5 of RFC 4122 when generating UUIDs.

The id attribute MUST return
the value to which it was initialized when the object was
created.

Methods

getAudioTracks

Returns a sequence of MediaStreamTrack
objects representing the audio tracks in this stream.

The getAudioTracks
method MUST return a sequence that represents a snapshot of all
the MediaStreamTrack objects in this stream's
track set whose kind is equal to
"audio". The conversion from the track set to the sequence is User Agent defined
and the order does not have to be stable between calls.

getVideoTracks

Returns a sequence of MediaStreamTrack
objects representing the video tracks in this stream.

The getVideoTracks
method MUST return a sequence that represents a snapshot of all
the MediaStreamTrack objects in this stream's
track set whose kind is equal to
"video". The conversion from the track set to the sequence is User Agent defined
and the order does not have to be stable between calls.

getTracks

Returns a sequence of MediaStreamTrack
objects representing all the tracks in this stream.

The getTracks method
MUST return a sequence that represents a snapshot of all the
MediaStreamTrack objects in this stream's
track set, regardless of kind. The conversion from
the track set to the sequence is User
Agent defined and the order does not have to be stable between
calls.

getTrackById

The getTrackById
method MUST return either a MediaStreamTrack
object from this stream's track set
whose id is
equal to trackId, or null, if no such track
exists.

MediaStreamTrack

A MediaStreamTrack object represents a media
source in the User Agent. An example source is a device connected to the
User Agent. Other specifications may define sources for
MediaStreamTrack that override the behavior specified
here. Several MediaStreamTrack objects can represent
the same media source, e.g., when the user chooses the same camera in the
UI shown by two consecutive calls to getUserMedia() .

The data from a MediaStreamTrack object does not
necessarily have a canonical binary form; for example, it could just be
"the video currently coming from the user's video camera". This allows
User Agents to manipulate media in whatever fashion is most suitable on
the user's platform.

A script can indicate that a MediaStreamTrack
object no longer needs its source with the stop() method. When all tracks
using a source have been stopped or ended by some other means, the source
is stopped. If the source is a device
exposed by getUserMedia(), then when
the source is stopped, the UA MUST run the following steps:

Let deviceId be the device's deviceId.

Set [[\devicesLiveMap]][deviceId] to
false.

If the result of retrieving the permission state
of the permission associated with the device's kind and
deviceId, is not “granted”, then set
[[\devicesAccessibleMap]][deviceId] to
false.

An implementation may use a per-source reference count to keep track
of source usage, but the specifics are out of scope for this
specification.

Life-cycle and Media Flow

Life-cycle

A MediaStreamTrack has two states in its
life-cycle: live and ended. A newly created
MediaStreamTrack can be in either state depending
on how it was created. For example, cloning an ended track results in a
new ended track. The current state is reflected by the object's
readyState
attribute.

In the live state, the track is active and media is
available for use by consumers (but may be replaced by
zero-information-content if the MediaStreamTrack is
muted or disabled, see below).

A muted or disabled MediaStreamTrack renders
either silence (audio), black frames (video), or a
zero-information-content equivalent. For example, a video element
sourced by a muted or disabled MediaStreamTrack
(contained within a MediaStream ), is playing but
the rendered content is the muted output.

If the source is a device exposed by getUserMedia(), then when a track becomes either
muted or disabled, and this brings all tracks connected to the device
to be either muted, disabled, or stopped, then the UA MAY, using the
device's deviceId, deviceId, set
[[\devicesLiveMap]][deviceId] to false,
provided the UA sets it back to true as soon as any
unstopped track connected to this device becomes un-muted or enabled
again.

The muted/unmuted state of a track reflects whether the source
provides any media at this moment. The enabled/disabled state is under
application control and determines whether the track outputs media (to
its consumers). Hence, media from the source only flows when a
MediaStreamTrack object is both unmuted and
enabled.

A MediaStreamTrack is muted when the source is temporarily unable to
provide the track with data. A track can be muted by a user. Often this
action is outside the control of the application. This could be as a
result of the user hitting a hardware switch or toggling a control in
the operating system / browser chrome. A track can also be muted by the
User Agent.

Applications are able to enable or
disable a MediaStreamTrack to prevent it from
rendering media from the source. A muted track will however, regardless
of the enabled state, render silence and blackness. A disabled track is
logically equivalent to a muted track, from a consumer point of
view.

For a newly created MediaStreamTrack object, the
following applies. The track is always enabled unless stated otherwise
(for example when cloned) and the muted state reflects the state of the
source at the time the track is created.

A MediaStreamTrack object is said to
end when the source of the track is disconnected or
exhausted.

When a MediaStreamTrack object ends for any
reason (e.g., because the user rescinds the permission for the page to
use the local camera, or because the application invoked the
stop() method on
the MediaStreamTrack object, or because the User
Agent has instructed the track to end for any reason) it is said to be
ended.

When a MediaStreamTracktrack ends
for any reason other than the stop() method being invoked,
the User Agent MUST queue a task that runs the following steps:

If the track'sreadyState attribute
has the value ended already, then abort these
steps.

Muted is out of control for the application, but can be observed by
the application by reading the muted attribute and listening
to the associated events mute and unmute. There can be
several reasons for a MediaStreamTrack to be muted:
the user pushing a physical mute button on the microphone, the user
toggling a control in the operating system, the user clicking a mute
button in the browser chrome, the User Agent (on behalf of the user)
mutes, etc.

Whenever the User Agent initiates such a change, it MUST queue a
task, using the user interaction task source, to set a track's muted
state to the state desired by the user.

To set a track's muted state to
newState, the User Agent MUST run the following steps:

Let track be the MediaStreamTrack in
question.

Set track's muted attribute to
newState.

If newState is true let
eventName be mute, otherwise
unmute.

Fire a simple event named eventName on
track.

Enabled/disabled on the other hand is
available to the application to control (and observe) via the
enabled
attribute.

The result for the consumer is the same in the sense that whenever
MediaStreamTrack is muted or disabled (or both) the
consumer gets zero-information-content, which means silence for audio
and black frames for video. In other words, media from the source only
flows when a MediaStreamTrack object is both
unmuted and enabled. For example, a video element sourced by a muted or
disabled MediaStreamTrack (contained in a
MediaStream ), is playing but rendering
blackness.

For a newly created MediaStreamTrack object, the
following applies: the track is always enabled unless stated otherwise
(for example when cloned) and the muted state reflects the state of the
source at the time the track is created.

Tracks and Constraints

Whether Constraints were provided at track
initialization time or need to be established later at runtime, the
APIs defined in the ConstrainablePattern Interface allow the
retrieval and manipulation of the constraints currently established on
a track.

If the overconstrained event is thrown, the
track MUST be muted until either new satisfiable constraints are
applied or the existing constraints become satisfiable.

When a MediaStreamTrack is created, the
User Agent MUST generate an identifier string, and MUST
initialize the object's id attribute to that
string, unless the object is created as part of a special
purpose algorithm that specifies how the stream id must be
initialized. See MediaStream's
id attribute for
guidelines on how to generate such an identifier.

An example of an algorithm that specifies how the track id
must be initialized is the algorithm to represent an incoming
network component with a MediaStreamTrack
object. [[WEBRTC10]]

id attribute MUST return the value
to which it was initialized when the object was created.

User Agents MAY label audio and video sources (e.g.,
"Internal microphone" or "External USB Webcam"). The label attribute
MUST return the label of the object's corresponding source, if
any. If the corresponding source has or had no label, the
attribute MUST instead return the empty string.

A camera can report multiple facing modes. For example, in a
high-end telepresence solution with several cameras facing the
user, a camera to the left of the user can report both "left"
and "user". See facingMode for
additional details.

The user agent MAY use cropping and downscaling to offer more
resolution choices than this camera naturally produces. The
reported sequence MUST list all the means the UA may employ to
derive resolution choices for this camera. The value "none" MUST
be present, indicating the ability to constrain the UA from
cropping and downscaling. See
resizeMode
for additional details.

If the source cannot do echo cancellation a single
false is reported. If echo cancellation cannot be
turned off, a single true is reported. If the
script can control the feature, the source reports a list with
both true and false as possible
values. See echoCancellation
for additional details.

If the source cannot do auto gain control a single
false is reported. If auto gain control cannot be
turned off, a single true is reported. If the
script can control the feature, the source reports a list with
both true and false as possible
values. See autoGainControl
for additional details.

If the source cannot do noise suppression a single
false is reported. If noise suppression cannot be
turned off, a single true is reported. If the
script can control the feature, the source reports a list with
both true and false as possible
values. See noiseSuppression
for additional details.

Constrainable Properties

The names of the initial set of constrainable properties for
MediaStreamTrack are defined below.

The following constrainable properties are defined to apply to both
video and audio MediaStreamTrack objects:

Property Name

Values

Notes

deviceId

DOMString

The origin-unique identifier for the source of the
MediaStreamTrack. The same identifier MUST be valid
between browsing sessions of this origin, but MUST also be
different for other origins. Some sort of GUID is recommended
for the identifier. Note that the setting of this property is
uniquely determined by the source that is attached to the
MediaStreamTrack. In particular, getCapabilities() will return only a
single value for deviceId. This property can therefore be used
for initial media selection with getUserMedia(). However, it is not useful
for subsequent media control with applyConstraints(), since any attempt to
set a different value will result in an unsatisfiable
ConstraintSet.

groupId

DOMString

The browsing session-unique group identifier for the source of
the MediaStreamTrack. Two devices have the same group
identifier if they belong to the same physical device; for
example, the audio input and output devices representing the
speaker and microphone of the same headset would have the same
groupId. Note that the setting of this property is uniquely
determined by the source that is attached to the
MediaStreamTrack. In particular, getCapabilities() will return only a
single value for groupId. Since this property is not stable
between browsing sessions its usefulness for initial media
selection with getUserMedia() is limited. It is not useful
for subsequent media control with applyConstraints(), since any attempt to
set a different value will result in an unsatisfiable
ConstraintSet.

The following constrainable properties are defined to apply only to
video MediaStreamTrack objects:

The exact frame rate (frames per second) or frame rate range.
If this frame rate cannot be determined (e.g. the source does not
natively provide a frame rate, or the frame rate cannot be
determined from the source stream), then this value MUST refer to
the User Agent's vsync display rate.

This string (or each string, when a list) should be one of
the members of VideoFacingModeEnum. The
members describe the directions that the camera can face, as seen
from the user's perspective. Note that getConstraints may not return
exactly the same string for strings not in this enum. This
preserves the possibility of using a future version of WebIDL
enum for this property.

This string (or each string, when a list) should be one of
the members of VideoResizeModeEnum. The
members describe the means by which the resolution can be derived
by the UA. In other words, whether the UA is allowed to use
cropping and downscaling on the camera output.

The UA MAY disguise concurrent use of
the camera, by cropping and/or downscaling to mimic
native resolutions when "none" is used, but only when the camera
is in use in another browsing context.

Note that getConstraints may not return
exactly the same string for strings not in this enum. This
preserves the possibility of using a future version of WebIDL
enum for this property.

enum VideoFacingModeEnum {
"user",
"environment",
"left",
"right"
};

VideoFacingModeEnum Enumeration
description

user

The source is facing toward the user (a self-view
camera).

environment

The source is facing away from the user (viewing the
environment).

left

The source is facing to the left of the user.

right

The source is facing to the right of the user.

Below is an illustration of the video facing modes in relation to
the user.

enum VideoResizeModeEnum {
"none",
"crop-and-scale"
};

VideoResizeModeEnum Enumeration
description

none

This resolution is offered by the camera, its driver, or the OS.

Note: The UA MAY report this value to disguise concurrent use,
but only when the camera is in use in another browsing context.

crop-and-scale

This resolution is downscaled and/or cropped from a higher
camera resolution by the user agent.

The following constrainable properties are defined to apply only to
audio MediaStreamTrack objects:

The volume or volume range, as a multiplier of the linear
audio sample values. A volume of 0.0 is silence, while a volume
of 1.0 is the maximum supported volume. A volume of 0.5 will
result in an approximately 6 dBSPL change in the sound
pressure level from the maximum volume. Note that any
ConstraintSet that specifies values outside of this range of 0 to
1 can never be satisfied.

When one or more audio streams is being played in the
processes of various microphones, it is often desirable to
attempt to remove the sound being played from the input signals
recorded by the microphones. This is referred to as echo
cancellation. There are cases where it is not needed and it is
desirable to turn it off so that no audio artifacts are
introduced. This allows applications to control this
behavior.

Automatic gain control is often desirable on the input signal
recorded by the microphone. There are cases where it is not
needed and it is desirable to turn it off so that the audio is
not altered. This allows applications to control this
behavior.

Noise suppression is often desirable on the input signal
recorded by the microphone. There are cases where it is not
needed and it is desirable to turn it off so that the audio is
not altered. This allows applications to control this
behavior.

The latency or latency range, in seconds. The latency is the
time between start of processing (for instance, when sound occurs
in the real world) to the data being available to the next step
in the process. Low latency is critical for some applications;
high latency may be acceptable for other applications because it
helps with power constraints. The number is expected to be the
target latency of the configuration; the actual latency may show
some variation from that.

MediaStreamTrackEvent

The addtrack and removetrack events notify
the script that the track set of a
MediaStream has been updated by the User Agent.

Firing a track event named
e with a MediaStreamTracktrack means that an event with the name e, which
does not bubble (except where otherwise stated) and is not cancelable
(except where otherwise stated), and which uses the
MediaStreamTrackEvent interface with the
track
attribute set to track, MUST be created and dispatched at the
given target.

Dictionary MediaStreamTrackEventInit Members

The model: sources, sinks, constraints, and settings

Browsers provide a media pipeline from sources to sinks. In a browser,
sinks are the <img>, <video>, and <audio> tags.
Traditional sources include streamed content, files, and web resources. The
media produced by these sources typically does not change over time - these
sources can be considered to be static.

The sinks that display these sources to the user (the actual tags
themselves) have a variety of controls for manipulating the source content.
For example, an <img> tag scales down a huge source image of
1600x1200 pixels to fit in a rectangle defined with
width="400" and height="300".

The getUserMedia API adds dynamic sources such as microphones and
cameras - the characteristics of these sources can change in response to
application needs. These sources can be considered to be dynamic in nature.
A <video> element that displays media from a dynamic source can
either perform scaling or it can feed back information along the media
pipeline and have the source produce content more suitable for display.

Note: This sort of feedback loop is obviously just
enabling an "optimization", but it's a non-trivial gain. This
optimization can save battery, allow for less network congestion,
etc...

Note that MediaStream sinks (such as
<video>, <audio>, and even
RTCPeerConnection) will continue to have mechanisms to further
transform the source stream beyond that which the Settings,
Capabilities, and Constraints described in this specification
offer. (The sink transformation options, including those of
RTCPeerConnection, are outside the scope of this
specification.)

The act of changing or applying a track constraint may affect the
settings of all tracks sharing that source and
consequently all down-level sinks that are using that source. Many sinks
may be able to take these changes in stride, such as the
<video> element or RTCPeerConnection.
Others like the Recorder API may fail as a result of a source setting
change.

The RTCPeerConnection is an interesting object because it
acts simultaneously as both a sink and a source for
over-the-network streams. As a sink, it has source transformational
capabilities (e.g., lowering bit-rates, scaling-up / down resolutions, and
adjusting frame-rates), and as a source it could have its own settings
changed by a track source.

To illustrate how changes to a given source impact various sinks,
consider the following example. This example only uses width and height,
but the same principles apply to all of the Settings exposed in this
specification. In the first figure a home client has obtained a video
source from its local video camera. The source's width and height settings
are 800 pixels and 600 pixels, respectively. Three
MediaStream objects on the home client contain tracks
that use this same deviceId. The three media streams
are connected to three different sinks: a <video>
element (A), another <video> element (B), and a peer
connection (C). The peer connection is streaming the source video to a
remote client. On the remote client there are two media streams with tracks
that use the peer connection as a source. These two media streams are
connected to two <video> element sinks (Y and
Z).

Note that at this moment, all of the sinks on the home client must apply
a transformation to the original source's provided dimension settings. B is
scaling the video down, A is scaling the video up (resulting in loss of
quality), and C is also scaling the video up slightly for sending over the
network. On the remote client, sink Y is scaling the video way
down, while sink Z is not applying any scaling.

In response to applyConstraints() being called, one of the
tracks wants a higher resolution (1920 by 1200 pixels) from the home
client's video source.

Note that the source change immediately affects all of the tracks and
sinks on the home client, but does not impact any of the sinks (or sources)
on the remote client. With the increase in the home client source video's
dimensions, sink A no longer has to perform any scaling, while sink B must
scale down even further than before. Sink C (the peer connection) must now
scale down the video in order to keep the transmission constant to the
remote client.

While not shown, an equally valid settings change request could be made
on the remote client's side. In addition to impacting sink Y and Z in the
same manner as A, B and C were impacted earlier, it could lead to
re-negotiation with the peer connection on the home client in order to
alter the transformation that it is applying to the home client's video
source. Such a change is NOT REQUIRED to change anything related to sink A
or B or the home client's video source.

Note that this specification does not define a mechanism by which a
change to the remote client's video source could automatically trigger a
change to the home client's video source. Implementations may choose to
make such source-to-sink optimizations as long as they only do so within
the constraints established by the application, as the next example
demonstrates.

It is fairly obvious that changes to a given source will impact sink
consumers. However, in some situations changes to a given sink may also
cause implementations to adjust a source's settings. This is illustrated in
the following figures. In the first figure below, the home client's video
source is sending a video stream sized at 1920 by 1200 pixels. The video
source is also unconstrained, such that the exact source dimensions are
flexible as far as the application is concerned. Two
MediaStream objects contain tracks with the same
deviceId, and those MediaStream s
are connected to two different <video> element sinks A
and B. Sink A has been sized to width="1920" and
height="1200" and is displaying the source's video content
without any transformations. Sink B has been sized smaller and, as a
result, is scaling the video down to fit its rectangle of 320 pixels across
by 200 pixels down.

When the application changes sink A to a smaller dimension (from 1920 to
1024 pixels wide and from 1200 to 768 pixels tall), the browser's media
pipeline may recognize that none of its sinks require the higher source
resolution, and needless work is being done both on the part of the source
and sink A. In such a case and without any other constraints forcing the
source to continue producing the higher resolution video, the media
pipeline MAY change the source resolution:

In the above figure, the home client's video source resolution was
changed to the greater of that from sink A and B in order to optimize
playback. While not shown above, the same behavior could apply to peer
connections and other sinks.

It is possible that constraints can be applied to a track which a
source is unable to satisfy, either because the source itself cannot
satisfy the constraint or because the source is already satisfying a
conflicting constraint. When this happens, the promise returned from
applyConstraints()
will be rejected, without applying any of the new constraints. Since
no change in constraints occurs in this case, there is also no required
change to the source itself as a result of this condition. Here is an
example of this behavior.

In this example, two media streams each have a video track that share
the same source. The first track initially has no constraints applied. It
is connected to sink N. Sink N has a resolution of 800 by 600 pixels and is
scaling down the source's resolution of 1024 by 768 to fit. The other track
has a mandatory constraint forcing off the source's fill light; it is
connected to sink P. Sink P has a width and height equal to that of the
source.

Now, the first track adds a mandatory constraint that the fill light
should be forced on. At this point, both mandatory constraints cannot be
satisfied by the source (the fill light cannot be simultaneously on and off
at the same time). Since this state was caused by the first track's attempt
to apply a conflicting constraint, the constraint application fails and
there is no change in the source's settings nor to the constraints on
either track.

Let's look at a slightly different situation starting from the same
point. In this case, instead of the first track attempting to apply a
conflicting constraint, the user physically locks the camera into a mode
where the fill light is on. At this point the source can no longer satisfy
the second track's mandatory constraint that the fill light be off. The
second track is transitioned into the muted state and receives an
overconstrained event. At the same time, the source notes that its
remaining active sink only requires a resolution of 800 by 600 and so it
adjusts its resolution down to match (this is an optional optimization that
the User Agent is allowed to make given the situation).

At this point, it is the responsibility of the application to address
the problem that led to the overconstrained situation, perhaps by removing
the fill light mandatory constraint on the second track or by closing the
second track altogether and informing the user.

MediaStreams in Media Elements

A MediaStream may be assigned to media elements. A
MediaStream is not preloadable or seekable and represents a
simple, potentially infinite, linear media timeline. The timeline starts at
0 and increments linearly in real time as long as the
MediaStream is playing. The timeline does not increment when
the playout of the MediaStream is paused.

User Agents that support this specification MUST support the
srcObject
attribute of the HTMLMediaElement interface defined in
[[!HTML52]], which includes support for playing MediaStream
objects.

The [[!HTML52]] document outlines how the HTMLMediaElement
works with a media provider object. The following applies when the
media provider object is a MediaStream:

Whenever an [[!HTML52]] AudioTrack
or a VideoTrack
is created, the id and label attributes must
be initialized to the corresponding attributes of the
MediaStreamTrack, the kind attribute must be
initialized to "main" and the language attribute to the
empty string

The User Agent MUST always play the current data from the
MediaStream and MUST NOT buffer.

When the MediaStream state moves from the active
to the inactive state, the User Agent
MUST raise an ended
event on the HTMLMediaElement and set its ended
attribute to true. Note that once ended
equals true the HTMLMediaElement will not
play media even if new MediaStreamTrack's are added
to the MediaStream (causing it to return to the
active state) unless autoplay is true or the
web application restarts the element, e.g., by calling play()

Any calls to the
fastSeek method on a HTMLMediaElement must be
ignored

The nature of the MediaStream places certain
restrictions on the behavior and attribute values of the associated
HTMLMediaElement and on the operations that can be performed
on it, as shown below:

Setting the loop attribute has no effect since a
MediaStream has no defined end and therefore
cannot be looped.

Error Handling

This section and its subsections extend the list of Error subclasses
defined in [[!ES6]] following the pattern for NativeError in section 19.5.6
of that specification. Assume the following:

that use of syntax such as [[\something]] and %something% is as used
in [[!ES6]].

that the rules for ECMAScript standard built-in objects ([[!ES6]],
section 17) are in effect in this section.

that the new intrinsic objects %OverconstrainedError% and
%OverconstrainedErrorPrototype% are available as if they had been
included in ([[!ES6]], Table 7) and all referencing sections, e.g.
([[!ES6]], section 8.2.2), thus behave appropriately.

ECMAScript 6 Terminology

The following terms used in this section are defined in [[!ES6]].

Term/Notation

Section in [[!ES6]]

Type(X)

6

intrinsic object

6.1.7.4

[[\ErrorData]]

19.5.1

internal slot

6.1.7.2

NewTarget

various uses, but no definition

active function object

8.3

OrdinaryCreateFromConstructor()

9.1.14

ReturnIfAbrupt()

6.2.2.4

Assert

5.2

String

4.3.17-19, depending on context

PropertyDescriptor

6.2.4

[[\Value]]

6.1.7.1

[[\Writable]]

6.1.7.1

[[\Enumerable]]

6.1.7.1

[[\Configurable]]

6.1.7.1

DefinePropertyOrThrow()

7.3.7

abrupt completion

6.2.2

ToString()

7.1.12

[[\Prototype]]

9.1

%Error%

19.5.1

Error

19.5

%ErrorPrototype%

19.5.3

Object.prototype.toString

19.1.3.6

OverconstrainedError Object

OverconstrainedError Constructor

The OverconstrainedError Constructor is the %OverconstrainedError%
intrinsic object. When OverconstrainedError is called as a
function rather than as a constructor, it creates and initializes a new
OverconstrainedError object. A call of the object as a function is
equivalent to calling it as a constructor with the same arguments. Thus
the function call OverconstrainedError(...)
is equivalent to the object creation expression new
OverconstrainedError(...) with the same
arguments.

The OverconstrainedError constructor is designed to be
subclassable. It may be used as the value of an extends
clause of a class definition. Subclass constructors that intend to
inherit the specified OverconstrainedError behaviour must
include a super call to the
OverconstrainedError constructor to create and initialize
the subclass instance with an [[\ErrorData]] internal slot.

OverconstrainedError ( constraint, message )

When the OverconstrainedError function is
called with arguments constraint and message the
following steps are taken:

If NewTarget is undefined, let newTarget be the
active function object, else let newTarget be
NewTarget.

Properties of the OverconstrainedError Constructor

The value of the [[\Prototype]] internal slot of the
OverconstrainedError constructor is the intrinsic object %Error%.

Besides the length property (whose value is 1),
the OverconstrainedError constructor has the following properties:

OverconstrainedError.prototype

The initial value of OverconstrainedError.prototype
is the OverconstrainedError
prototype object. This property has the attributes {
[[\Writable]]: false, [[\Enumerable]]: false,
[[\Configurable]]: false }.

Properties of the OverconstrainedError Prototype Object

The OverconstrainedError prototype object is an ordinary object. It
is not an Error instance and does not have an [[\ErrorData]] internal
slot.

The value of the [[\Prototype]] internal slot of the
OverconstrainedError prototype object is the intrinsic object
%ErrorPrototype%.

OverconstrainedError.prototype.constructor

The initial value of the constructor property of the prototype for
the OverconstrainedError constructor is the intrinsic object %OverconstrainedError%.

OverconstrainedError.prototype.constraint

The initial value of the constraint property of the prototype for
the OverconstrainedError constructor is the empty String.

OverconstrainedError.prototype.message

The initial value of the message property of the prototype for the
OverconstrainedError constructor is the empty String.

OverconstrainedError.prototype.name

The initial value of the name property of the prototype for the
OverconstrainedError constructor is
"OverconstrainedError".

Properties of OverconstrainedError Instances

OverconstrainedError instances are ordinary objects that inherit
properties from the OverconstrainedError prototype object and have an
[[\ErrorData]] internal slot whose value is undefined. The only
specified use of [[\ErrorData]] is by Object.prototype.toString
([[!ES6]], section 19.1.3.6) to identify instances of Error or its
various subclasses.

The following will need to be updated when we finish out the error
definitions.

The following interface is defined for cases when an
OverconstrainedError is raised as an event:

This error event fires for each affected track (when multiple
tracks share the same source) after the User Agent has evaluated
the current constraints against a given source and is not able to
configure the source within the limitations established by the
intersection of imposed constraints.

Due to being over-constrained, the User Agent must mute each
affected track.

The affected track(s) will remain muted until the application adjusts the
constraints to accommodate the source's current effective
capabilities.

ended

Event

The MediaStreamTrack object's source will no
longer provide any data, either because the user revoked the
permissions, or because the source device has been ejected, or
because the remote peer permanently stopped sending data.

Create three internal slots: [[\devicesLiveMap]],
[[\devicesAccessibleMap]], and [[\kindsAccessibleMap]], each
initialized to a different empty object.

Create one internal slot: [[\storedDeviceList]], initialized
to null.

For each kind of device, kind, that getUserMedia() exposes, set
[[\kindsAccessibleMap]][kind] either to true
if the result of retrieving the permission state
of the permission associated with kind (e.g. "camera",
"microphone"), is "granted", or to false otherwise.

For each individual device that getUserMedia() exposes, using the device's
deviceId, deviceId, set
[[\devicesLiveMap]][deviceId] to false, and
set [[\devicesAccessibleMap]][deviceId] either to
true if the result of retrieving the permission state
of the permission associated with the device’s kind and
deviceId, is “granted”, or to false otherwise.

If the transition is to “granted” from another value, then set
[[\devicesAccessibleMap]][deviceId] to true,
if it isn’t already true.

If the transition is from “granted” to another value, and the
device is currently stopped, then set
[[\devicesAccessibleMap]][deviceId] to
false.

When new media input and/or output devices are made available, or any
available input and/or output device becomes unavailable, the User Agent
MUST run the following steps in browsing contexts where at least one of
the following criteria are met, but in no other contexts:

Attributes

Methods

enumerateDevices

Collects information about the User Agent's available media
input and output devices.

This method returns a promise. The promise will be
fulfilled with a sequence of
MediaDeviceInfo dictionaries representing the
User Agent's available media input and output devices if
enumeration is successful.

Camera and microphone sources should
be enumerable. Specifications that add additional types of source
will provide recommendations about whether the source type should
be enumerable.

When the enumerateDevices()
method is called, the User Agent must run the following
steps:

Let p be a new promise.

Run the following steps in parallel:

If [[\storedDeviceList]] is not null, then let
resultList be a copy of [[\storedDeviceList]],
and jump to the step labeled Complete
Enumeration.

Let resultList be an empty list.

If this method has been called previously within this
browsing session, let oldList be the list of
MediaDeviceInfo objects that was
produced at that call (resultList); otherwise,
let oldList be an empty list.

Probe the User Agent for available media devices, and
run the following sub steps for each discovered device,
device:

If device is represented by a
MediaDeviceInfo object in
oldList, append that object to
resultList, abort these steps and continue
with the next device (if any).

If a stored deviceId exists for
device, initialize deviceInfo's
deviceId to that value.
Otherwise, let deviceInfo's
deviceId member be a
newly generated unique identifier.

If device belongs to the same physical
device as a device already represented in
oldList or resultList,
initialize deviceInfo's
groupId member to the
groupId value of the
existing MediaDeviceInfo object.
Otherwise, let deviceInfo's
groupId member be a
newly generated unique identifier.

If any of the local devices are attached to a
live MediaStreamTrack in the current
browsing context, set list-permission to
"granted", otherwise set list-permission
to the result of retrieving the
permission state of the "device-info"
permission.

If list-permission is not "granted",
let filteredList be a copy of
resultList, and all its elements, where
the label member is the
empty string.

If filteredList is a non-empty list,
then resolvep with
filteredList. Otherwise, resolvep with resultList.

Return p.

Since this method returns persistent
information across browsing sessions and origins via the number
and grouping of media capture devices, it adds to the
fingerprinting surface exposed by the user agent.

Once authorization has been granted to one
of the capture devices, it provides additional persistent
cross-origin information via the human readable labels associated
with available capture devices, which further adds to the
fingerprinting surface.

Access control model

The algorithm described above means that the access to media device
information depends on whether or not permission has been granted to
the page's origin.

If no such access has been granted, the
MediaDeviceInfo dictionary will contain the
deviceId, kind, and groupId.

If access has been granted for a media device, the
MediaDeviceInfo dictionary will contain the deviceId, kind,
label, and groupId.

Attributes

All enumerable devices have an identifier that MUST be unique
to the page's origin. This identifier MUST be un-guessable by
applications of other origins to prevent the identifier from
being used to correlate the same user across different
origins.

If any local devices have been attached to a live
MediaStreamTrack in a page from this origin, or stored permission to access local
devices has been granted to this origin, then this identifier
MUST be persisted, except as detailed below. Unique and stable
identifiers let the application save, identify the availability
of, and directly request specific sources, across multiple
visits.

However, as long as no local device has been attached to a
live MediaStreamTrack in a page from this origin, and no stored permission to access local
devices has been granted to this origin, then the user agent MAY
clear this identifier once the last browsing session from this
origin has been closed. If the user agent chooses not to clear
the identifier in this condition, then it MUST provide for the
user to visibly inspect and delete the identifier, like a
cookie.

Since deviceId may persist
across browsing sessions and to reduce its potential as a
fingerprinting mechanism, deviceId is to be treated
as other persistent storage mechanisms such as cookies
[[COOKIES]], in that user agents MUST NOT persist device
identifiers for sites that are blocked from using cookies, and
user agents MUST reset per-origin device identifiers when other
persistent storage are cleared.

Returns the group identifier of the represented device. Two
devices have the same group identifier if they belong to the same
physical device; for example a monitor with a built-in camera and
microphone.

Methods

getCapabilities()

Returns a MediaTrackCapabilities object
describing the primary audio or video track of a device's
MediaStream (according to its
kind value), in the absence of any user-supplied
constraints. These capabilities MUST be identical to those that
would have been obtained by calling
getCapabilities() on the first
MediaStreamTrack of this type in a
MediaStream returned by
getUserMedia({deviceId: id}) where id
is the value of the deviceId attribute of this
MediaDeviceInfo.

If no access has been granted to any local devices and this
InputDeviceInfo has been filtered with respect to
unique identifying information (see above description of
enumerateDevices() result), then this method returns
null.

Obtaining local multimedia content

This section extends Navigator and
MediaDevices with APIs to request permission to access
media input devices available to the User Agent.

Alternatively, a local MediaStream can be captured
from certain types of DOM elements, such as the video element
[[mediacapture-fromelement]]. This can be useful for automated testing.

When on an insecure origin [[mixed-content]], User Agents are encouraged
to warn about usage of navigator.mediaDevices.getUserMedia,
navigator.getUserMedia, and any prefixed variants in their
developer tools, error logs, etc. It is explicitly permitted for User
Agents to remove these APIs entirely when on an insecure origin, as long as
they remove all of them at once (e.g., they should not leave just the
prefixed version available on insecure origins).

Legacy Interface Extensions

The definition of getUserMedia() in this section reflects two major
changes from the method definition that has existed here for many
months.

First, the official definition for the getUserMedia() method, and
the one which developers are encouraged to use, is now at MediaDevices. This decision
reflected consensus as long as the original API remained available here
under the Navigator object for backwards compatibility reasons, since
the working group acknowledges that early users of these APIs have been
encouraged to define getUserMedia as "var getUserMedia =
navigator.getUserMedia || navigator.webkitGetUserMedia ||
navigator.mozGetUserMedia;" in order for their code to be functional
both before and after official implementations of getUserMedia() in
popular browsers. To ensure functional equivalence, the getUserMedia()
method here is defined in terms of the method under MediaDevices.

Second, the decision to change all other callback-based methods in
the specification to be based on Promises instead required that the
navigator.getUserMedia() definition reflect this in its use of
navigator.mediaDevices.getUserMedia(). Because navigator.getUserMedia()
is now the only callback-based method remaining in the specification,
there is ongoing discussion as to a) whether it still belongs in the
specification, and b) if it does, whether its syntax should remain
callback-based or change in some way to use Promises. Input on these
questions is encouraged, particularly from developers actively using
today's implementations of this functionality.

Note that the other methods that changed from a callback-based
syntax to a Promises-based syntax were not considered to have been
implemented widely enough in any form to have to consider legacy
usage.

MediaDevices Interface Extensions

The definition of getUserMedia() in this section reflects two major
changes from the method definition that has existed under
Navigator for many months.

First, the official definition for the getUserMedia() method, and
the one which developers are encouraged to use, is now the one defined
here under MediaDevices. This decision reflected consensus as long as
the original API remained available at
Navigator.getUserMedia under the Navigator object for
backwards compatibility reasons, since the working group acknowledges
that early users of these APIs have been encouraged to define
getUserMedia as "var getUserMedia = navigator.getUserMedia ||
navigator.webkitGetUserMedia || navigator.mozGetUserMedia;" in order
for their code to be functional both before and after official
implementations of getUserMedia() in popular browsers. To ensure
functional equivalence, the getUserMedia() method under
Navigator is defined in terms of the method here.

Second, the method defined here is Promises-based, while the one
defined under Navigator is currently still callback-based.
Developers expecting to find getUserMedia() defined under
Navigator are strongly encouraged to read the detailed Note
given there.

The getSupportedConstraints method is provided to allow
the application to determine which constraints the User Agent
recognizes.

Methods

getSupportedConstraints

Returns a dictionary whose members are the constrainable
properties known to the User Agent. A supported constrainable
property MUST be represented and any constrainable properties not
supported by the User Agent MUST NOT be present in the returned
dictionary. The values returned represent what the browser
implements and will not change during a browsing session.

getUserMedia

Prompts the user for permission to use their Web cam or other
video or audio input.

This method returns a promise. The promise will be
fulfilled with a suitable MediaStream
object if the user accepts valid tracks as described below.

The promise will be rejected if there is a failure in
finding valid tracks or if the user denies permission, as
described below.

When the getUserMedia()
method is called, the User Agent MUST run the following
steps:

Let constraints be the method's first
argument.

Let requestedMediaTypes be the set of media
types in constraints with either a dictionary
value or a value of "true".

If requestedMediaTypes is the empty set, return
a promise rejected with a TypeError. The
word "optional" occurs in the WebIDL due to WebIDL rules, but
the argument MUST be supplied in order for the call to
succeed.

For each possible source for that media type,
construct an unconstrained MediaStreamTrack with that
source as its source.

Call this set of tracks the
candidateSet.

If candidateSet is the empty set,
rejectp with a new
DOMException object whose
name attribute has the value
NotFoundError and abort these steps.

If the value of the T entry of
constraints is "true", set CS to the empty
constraint set (no constraint). Otherwise, continue
with CS set to the value of the T
entry of constraints.

Run the SelectSettings
algorithm on each track in CandidateSet
with CS as the constraint set. If the
algorithm returns undefined, remove the
track from candidateSet. This eliminates
devices unable to satisfy the constraints, by
verifying that at least one settings dictionary
exists that satisfies the constraints.

If candidateSet is the empty set, let
constraint be any required constraint
whose fitness distance was infinity for all settings
dictionaries examined while executing the
SelectSettings algorithm, let
message be either undefined
or an informative human-readable message, and
rejectp with a new
OverconstrainedError created by
calling
OverconstrainedError(constraint,
message), then abort these
steps.

This error gives information
about what the underlying device is not capable of
producing, before the user has given any
authorization to any device, and can thus be used as
a fingerprinting surface.

Retrieve the permission state for all
candidate devices in candidateSet that are
not attached to a live MediaStreamTrack
in the current browsing context. Remove from
candidateSet any device for which the
permission state is "denied".

If candidateSet is now empty,
indicating that all devices of this type are in state
"denied", jump to the step labeled
PermissionFailure below.

Add all tracks from candidateSet to
finalSet.

Optionally, e.g., based on a previously-established
user preference, for security reasons, or due to platform
limitations, jump to the step labeled Permission
Failure below.

For the origin identified by
originIdentifier, request permission to
use a PermissionDescriptor with its name member set
to the permission name associated with kind
(e.g. "camera", "microphone"), and, optionally, its
deviceId member set to the device's deviceId,
while considering all devices attached to a
live MediaStreamTrack in the current
browsing
context to have permission status "granted",
resulting in a set of provided media.

The provided media MUST include precisely one track of
each media type in requestedMediaTypes from the
finalSet. The decision of which devices to
choose from the finalSet is completely up to
the User Agent and may be determined by asking the user.
Once selected, the source of a
MediaStreamTrack MUST NOT change.

The User Agent MAY use the value of the computed
"fitness distance" from the SelectSettings
algorithm, or any other internally-available information
about the devices, as an input to the selection
algorithm.

User Agents are encouraged to default to using the
user's primary or system default camera and/or microphone
(when possible) to generate the media stream. User Agents
MAY allow users to use any media source, including
pre-recorded media files.

If the result of the request is "granted", then for
each device that is sourcing the provided media, using
the device's deviceId, deviceId, set
[[\devicesLiveMap]][deviceId] to
true, if it isn’t already true,
and set the
[[\devicesAccessibleMap]][deviceId] to
true, if it isn’t already
true.

If the result is "denied", jump to the step labeled
Permission Failure below. If the user never
responds, this algorithm stalls on this step.

If the user grants permission but a hardware error
such as an OS/program/webpage lock prevents access,
rejectp with a new
DOMException object whose
name attribute has the value
NotReadableError and abort these steps.

If the result is "granted" but device access fails for
any reason other than those listed above, rejectp with a new DOMException
object whose name attribute has the
value AbortError and abort these steps.

Let stream be the
MediaStream object for which the user
granted permission.

Permission Failure: Rejectp with a new DOMException
object whose name attribute has the
value NotAllowedError.

Return p.

In the algorithm above, constraints are checked twice - once at
device selection, and once after access approval. Time may have passed
between those checks, so it is conceivable that the selected device is
no longer suitable. In this case, a NotReadableError will result.

If true, it requests that the returned
MediaStream contain a video track. If a Constraints
structure is provided, it further specifies the nature and
settings of the video Track. If false, the
MediaStream MUST NOT contain a video Track.

If true, it requests that the returned
MediaStream contain an audio track. If a
Constraints structure is provided, it further specifies
the nature and settings of the audio Track. If
false, the MediaStream MUST NOT contain an
audio Track.

Implementation Suggestions

Resource
reservation

The User Agent is encouraged to reserve
resources when it has determined that a given call to getUserMedia() will be successful. It is preferable
to reserve the resource prior to resolving the returned promise.
Subsequent calls to getUserMedia()
(in this page or any other) should treat the resource that was
previously allocated, as well as resources held by other applications,
as busy. Resources marked as busy should not be provided as sources to
the current web page, unless specified by the user. Optionally, the
User Agent may choose to provide a stream sourced from a busy source
but only to a page whose origin matches the owner of the original
stream that is keeping the source busy.

This document recommends that in the permission
grant dialog or device selection interface (if one is present), the
user be allowed to select any available hardware as a source for the
stream requested by the page (provided the resource is able to fulfill
any specified mandatory constraints). Although not specifically
recommended as best practice, note that some User Agents may support
the ability to substitute a video or audio source with local files and
other media. A file picker may be used to provide this functionality to
the user.

This document also recommends that the user be
shown all resources that are currently busy as a result of prior calls
to getUserMedia() (in this
page or any other page that is still alive) and be allowed to terminate
that stream and utilize the resource for the current page instead. If
possible in the current operating environment, it is also suggested
that resources currently held by other applications be presented and
treated in the same manner. If the user chooses this option, the track
corresponding to the resource that was provided to the page whose
stream was affected must be removed.

Stored
Permissions

When permission is requested for a device, the
User Agent may choose to create a
permission storage entry for later use by the same origin, so that
the user does not need to grant permission again at a later time.
[[!RTCWEB-SECURITY-ARCH]] Section 5.2 requires that such storing MUST
only be done when the page is secure (served over HTTPS and having no
mixed content). It is a User Agent choice whether it offers
functionality to store permission to each device separately, all
devices of a given class, or all devices; the choice needs to be
apparent to the user, and permission must have been granted for the
entire set whose permission is being stored, e.g., to store permission
to use all cameras the user must have given permission to use all
cameras and not just one.

As described, this specification does not
dictate whether or not granting permission results in a stored
permission. When permission is not stored, permission will last only
until such time as all MediaStreamTracks sourced from that device have
been stopped.

Handling multiple
devices

A MediaStream may contain more than one
video and audio track. This makes it possible to include video from two
or more webcams in a single stream object, for example. However, the
current API does not allow a page to express a need for multiple video
streams from independent sources.

It is recommended for multiple calls to
getUserMedia() from the
same page to be allowed as a way for pages to request multiple discrete
video and/or audio streams.

Note also that if multiple getUserMedia() calls are done by a
page, the order in which they request resources, and the order in which
they complete, is not constrained by this specification.

A single call to getUserMedia() will always return
a stream with either zero or one audio tracks, and either zero or one
video tracks. If a script calls getUserMedia() multiple times
before reaching a stable state, this document advises the UI designer
that the permission dialogs should be merged, so that the user can give
permission for the use of multiple cameras and/or media sources in one
dialog interaction. The constraints on each getUserMedia call can be
used to decide which stream gets which media sources.

Constrainable Pattern

The Constrainable pattern allows applications to inspect and adjust the
properties of objects implementing it (the constrainable
object). It is broken out as a separate set of definitions so that it
can be referred to by other specifications. The core concept is the
Capability, which consists of a constrainable property of an object and the
set of its possible values, which may be specified either as a range or as
an enumeration. For example, a camera might be capable of framerates (a
property) between 20 and 50 frames per second (a range) and may be able to
be positioned (a property) facing towards the user, away from the user, or
to the left or right of the user (an enumerated set). The application can
examine a constrainable property's supported Capabilities via the
getCapabilities() accessor.

The application can select the (range of) values it wants for an
object's Capabilities by means of basic and/or advanced ConstraintSets and
the applyConstraints() method. A ConstraintSet consists of the
names of one or more properties of the object plus the desired value (or a
range of desired values) for each property. Each of those property/value
pairs can be considered to be an individual constraint. For example, the
application may set a ConstraintSet containing two constraints, the first
stating that the framerate of a camera be between 30 and 40 frames per
second (a range) and the second that the camera should be facing the user
(a specific value). How the individual constraints interact depends on
whether and how they are given in the basic Constraint structure, which is
a ConstraintSet with an additional 'advanced' property, or whether they are
in a ConstraintSet in the advanced list. The behavior is as follows: all
'min', 'max', and 'exact' constraints in the basic Constraint structure are
together treated as the 'required' set, and if it is not possible to
satisfy simultaneously all of those individual constraints for the
indicated property names, the User Agent MUST reject the returned
promise. Otherwise, it must apply the required constraints. Next, it will
consider any ConstraintSets given in the 'advanced' list, in the order in
which they are specified, and will try to satisfy/apply each complete
ConstraintSet (i.e., all constraints in the ConstraintSet together), but
will skip a ConstraintSet if and only if it cannot satisfy/apply it in its
entirety. Next, the User Agent MUST attempt to apply, individually, any
'ideal' constraints or a constraint given as a bare value for the property.
Of these properties, it MUST satisfy the largest number that it can, in any
order. Finally, the User Agent MUST resolve the returned
promise.

Any constraint provided via this API will only be considered if the given
constrainable property is supported by the browser. JavaScript
application code is expected to first check, via
getSupportedConstraints(), that all the named properties
that are used are supported by the browser. The reason for this is that
WebIDL drops any unsupported names from the dictionary holding the
constraints, so the browser does not see them and the unsupported names
end up being silently ignored. This will cause confusing programming
errors as the JavaScript code will be setting constraints but the browser
will be ignoring them. Browsers that support (recognize) the name of a
required constraint but cannot satisfy it will generate an error, while
browsers that do not support the constrainable property will not generate
an error.

The following examples may help to understand how constraints work. The
first shows a basic Constraint structure. Three constraints are given, each
of which the User Agent will attempt to satisfy individually. Depending
upon the resolutions available for this camera, it is possible that not all
three constraints can be satisfied at the same time. If so, the User Agent
will satisfy two if it can, or only one if not even two constraints can be
satisfied together. Note that if not all three can be satisfied
simultaneously, it is possible that there is more than one combination of
two constraints that could be satisfied. If so, the User Agent will
choose.

This next example adds a small bit of complexity. The ideal values are
still given for width and height, but this time with minimum requirements
on each as well as a minimum frameRate that must be satisfied. If it cannot
satisfy the frameRate, width or height minimum it will reject the
promise. Otherwise, it will try to satisfy the width, height, and
aspectRatio target values as well and then resolve the promise. Note
that the frameRate minimum might be within the capabilities of the camera
and satisfiable in ideal lighting conditions, but not in low light, and
could therefore result in firing of the onoverconstrained
event handler under poor lighting conditions.

This example illustrates the full control possible with the Constraints
structure by adding the 'advanced' property. In this case, the User Agent
behaves the same way with respect to the required constraints, but before
attempting to satisfy the ideal values it will process the 'advanced' list.
In this example the 'advanced' list contains two ConstraintSets. The first
specifies width and height constraints, and the second specifies an
aspectRatio constraint. Note that in the advanced list, these bare values
are treated as 'exact' values. This example represents the following: "I
need my video to be at least 640 pixels wide and at least 480 pixels high.
My preference is for precisely 1920x1280, but if you can't give me that,
give me an aspectRatio of 4x3 if at all possible. If even that is not
possible, give me a resolution as close to 1280x720 as possible."

The ordering of advanced ConstraintSets is significant. In the preceding
example it is impossible to satisfy both the 1920x1280 ConstraintSet and
the 4x3 aspect ratio ConstraintSet at the same time. Since the 1920x1280
occurs first in the list, the User Agent will attempt to satisfy it first.
Application authors can therefore implement a backoff strategy by
specifying multiple advanced ConstraintSets for the same property. For
example, an application might specify three advanced ConstraintSets, the
first asking for a frame rate greater than 500, the second asking for a
frame rate greater than 400, and the third asking for one greater than 300.
If the User Agent is capable of setting a frame rate greater than 500, it
will (and the subsequent two ConstraintSets will be trivially satisfied).
However, if the User Agent cannot set the frame rate above 500, it will
skip that ConstraintSet and attempt to set the frame rate above 400. If
that fails, it will then try to set it above 300. If the User Agent cannot
satisfy any of the three ConstraintSets, it will set the frame rate to any
value it can get. If the developers wanted to insist on 300 as a lower
bound, they could provide that as a 'min' value in the basic ConstraintSet.
In that case, the User Agent would fail altogether if it couldn't get a
value over 300, but would choose a value over 500 if possible, then try for
a value over 400.

Note that, unlike basic constraints, the constraints within a
ConstraintSet in the advanced list must be satisfied together or skipped
together. Thus, {width: 1920, height: 1280} is a request for that specific
resolution, not a request for that width or that height. One can think of
the basic constraints as requesting an 'or' (non-exclusive) of the
individual constraints, while each advanced ConstraintSet is requesting an
'and' of the individual constraints in the ConstraintSet. An application
may inspect the full set of Constraints currently in effect via the
getConstraints() accessor.

The specific value that the User Agent chooses for a constrainable
property is referred to as a Setting. For example, if the application
applies a ConstraintSet specifying that the frameRate must be at least 30
frames per second, and no greater than 40, the Setting can be any
intermediate value, e.g., 32, 35, or 37 frames per second. The application
can query the current settings of the object's constrainable properties via
the getSettings()
accessor.

Interface Definition

Although this specification formally defines
ConstrainablePattern as a WebIDL interface, it is actually a
template or pattern for other interfaces and cannot be inherited directly
since the return values of the methods need to be extended, something
WebIDL cannot do. Thus, each interface that wishes to make use of the
functionality defined here will have to provide its own copy of the
WebIDL for the functions and interfaces given here. However it can refer
to the semantics defined here, which will not change. See MediaStreamTrack Interface
Definition for an example of this.

When the User Agent is no longer able to satisfy the
requiredConstraints from the currently valid Constraints, the
User Agent MUST queue a task that fires an
OverconstrainedErrorEvent, initialized as described in the
following paragraph, at the constrainable object. The event firing
task MAY also be used to update the constrainable object as a
result of the overconstrained situation.

The OverconstrainedErrorEvent references an
OverconstrainedError whose constraint
attribute is set to one of the requiredConstraints that can no
longer be satisfied. The message attribute of the
OverconstrainedError SHOULD contain a string that is useful
for debugging. The conditions under which this error might occur are
platform and application-specific. For example, the user might physically
manipulate a camera in a way that makes it impossible to provide a
resolution or frameRate that satisfies the constraints. The User Agent
MAY take other actions as a result of the overconstrained situation.

Attributes

Methods

getCapabilities

The getCapabilities()
method returns the dictionary of the names of the constrainable
properties that the object supports.

It is possible that the underlying hardware may not exactly
map to the range defined for the constrainable property. Where
this is possible, the entry SHOULD define how to translate and
scale the hardware's setting onto the values defined for the
property. For example, suppose that a hypothetical
fluxCapacitance property ranges from -10 (min) to 10 (max), but
there are common hardware devices that support only values of
"off" "medium" and "full". The constrainable property
definition might specify that for such hardware, the User Agent
should map the range value of -10 to "off", 10 to "full", and 0
to "medium". It might also indicate that given a ConstraintSet
imposing a strict value of 3, the User Agent should attempt to
set the value of "medium" on the hardware, and that
getSettings() should return a
fluxCapacitance of 0, since that is the value defined as
corresponding to "medium".

getConstraints

The getConstraints() method returns the Constraints
that were the argument to the most recent successful invocation
of the applyConstraints algorithm on the object,
maintaining the order in which they were specified. Note that
some of the advanced ConstraintSets returned may not be currently
satisfied. To check which ConstraintSets are currently in effect,
the application should use getSettings. Instead of
returning the exact constraints as described above, the UA MAY
return a constraint set that has the identical effect in all
situations as the applied constraints.

getSettings

The getSettings() method returns the current
settings of all the constrainable properties of the object,
whether they are platform defaults or have been set by the
applyConstraints algorithm. Note that a setting is a target
value that complies with constraints, and therefore may differ
from measured performance at times.

The applyConstraints algorithm for applying constraints
is stated below. Here are some preliminary definitions that are
used in the statement of the algorithm:

We use the term settings dictionary for the set of
values that might be applied as settings to the object.

For string valued constraints, we define "==" below to be true
if one of the values in the sequence is exactly the same as the
value being compared against.

We define the fitness distance between a
settings dictionary and a constraint set CS as
the sum, for each constraint provided for a constraint name in
CS, of the following values:

If the constraint is not supported by the browser, the
fitness distance is 0.

If the constraint is required ('min', 'max', or 'exact'),
and the settings dictionary's value for the constraint
does not satisfy the constraint, the fitness distance is
positive infinity.

If the constraint is not required, and does not apply for
this type of device, the fitness distance is 0 (that is, the
constraint does not influence the fitness distance).

If no ideal value is specified, the fitness distance is
0.

For all positive numeric non-required constraints (such as
height, width, frameRate, aspectRatio, sampleRate and
sampleSize), the fitness distance is the result of the formula

(actual == ideal) ? 0 : |actual -
ideal|/max(|actual|,|ideal|)

For all string and enum non-required constraints (e.g.
deviceId, groupId, facingMode, resizeMode, echoCancellation),
the fitness distance is the result of the formula

(actual == ideal) ? 0 : 1

More definitions:

We refer to each element of a ConstraintSet (other than the
special term 'advanced') as a 'constraint' since it is intended
to constrain the acceptable settings for the given property
from the full list or range given in the corresponding
Capability of the ConstrainablePattern object to a value that
is within the range or list of values it specifies.

We refer to the "effective Capability" C of an object O as
the possibly proper subset of the possible values of C (as
returned by getCapabilities) taking into consideration
environmental limitations and/or restrictions placed by other
constraints. For example given a ConstraintSet that constrains
the aspectRatio, height, and width properties, the values
assigned to any two of the properties limit the effective
Capability of the third. The set of effective Capabilities may
be platform dependent. For example, on a resource-limited
device it may not be possible to set properties P1 and P2 both
to 'high', while on another less limited device, this may be
possible.

A settings dictionary, which is a set of values for the
constrainable properties of an object O, satisfies
ConstraintSet CS if the fitness distance between the set and CS
is less than infinity.

A set of ConstraintSets CS1...CSn (n >= 1) can be
satisfied by an object O if it is possible to find a settings
dictionary of O that satisfies CS1...CSn simultaneously.

To apply a set of ConstraintSets CS1...CSn to object O is
to choose such a sequence of values that satisfy CS1...CSn and
assign them as the settings for the properties of O.

We define the SelectSettings algorithm as
follows:

Each constraint specifies one
or more values (or a range of values) for its property. A
property MAY appear more than once in the list of 'advanced'
ConstraintSets. If an empty object or list has been given as
the value for a constraint, it MUST be interpreted as if the
constraint were not specified (in other words, an empty
constraint == no constraint).

Note that unknown properties are discarded by WebIDL,
which means that unknown/unsupported required constraints
will silently disappear. To avoid this being a surprise,
application authors are expected to first use the
getSupportedConstraints() method as shown in the
Examples below.

Let object be the
ConstrainablePattern object on which this
algorithm is applied. Let copy be an unconstrained
copy of object (i.e., copy should behave
as if it were object with all ConstraintSets
removed.)

If candidates is empty, return
undefined as the result of the
SelectSettings() algorithm.

Iterate over the 'advanced' ConstraintSets in
newConstraints in the order in which they were
specified. For each ConstraintSet:

compute the fitness distance between it and each
settings dictionary in candidates, treating
bare values of properties as exact.

If the fitness distance is finite for one or more
settings dictionaries in candidates, keep
those settings dictionaries in candidates,
discarding others.

If the fitness distance is infinite for all settings
dictionaries in candidates, ignore this
ConstraintSet.

Select one settings dictionary from candidates,
and return it as the result of the
SelectSettings() algorithm. The UA SHOULD use
the one with the smallest fitness distance, as
calculated in step 3, but MAY prefer ones with
resizeMode
set to "none" over "crop-and-scale".

When the applyConstraints algorithm is called, the
User Agent MUST run the following steps:

Let p be a new promise.

Let newConstraints be the argument to this
function.

Run the following steps in parallel:

Let successfulSettings be the result of
running the SelectSettings algorithm with
newConstraints as the constraint set.

If successfulSettings is
undefined, let failedConstraint
be any required constraint whose fitness distance was
infinity for all settings dictionaries examined while
executing the SelectSettings
algorithm, let message be either
undefined or an informative human-readable
message, rejectp with a new
OverconstrainedError created by calling
OverconstrainedError(failedConstraint,
message), and abort these steps. The
existing constraints remain in effect in this case.

In a single operation, remove the existing constraints
from object, apply newConstraints,
and apply successfulSettings as the current
settings.

The User Agent MAY choose new settings for the constrainable
properties of the object at any time. When it does so it MUST
attempt to satisfy all current Constraints, in the manner
described in the algorithm above.

Any implementation that has the same result as the algorithm
above is an allowed implementation. For instance, the
implementation may choose to keep track of the maximum and
minimum values for a setting that are OK under the constraints
considered, rather than keeping track of all possible values
for the setting.

When picking a settings dictionary, the UA can use any
information available to it. Examples of such information may
be whether the selection is done as part of device selection in
getUserMedia, whether the energy usage of the camera varies
between the settings dictionaries, or whether using a settings
dictionary will cause the device driver to apply
resampling.

Types for Constrainable Properties

The syntax for the specification of the set of legal values depends on
the type of the values. In addition to the standard atomic types
(boolean, long, double, DOMString), legal values include lists of any of
the atomic types, plus min-max ranges, as defined below.

List values MUST be interpreted as disjunctions. For example, if a
property 'facingMode' for a camera is defined as having legal values
["left", "right", "user", "environment"], this means that 'facingMode'
can have the values "left", "right", "environment", and "user". Similarly
Constraints restricting 'facingMode' to ["user", "left", "right"]
would mean that the User Agent should select a camera (or point the
camera, if that is possible) so that "facingMode" is either "user",
"left", or "right". This Constraint would thus request that the camera
not be facing away from the user, but would allow the User Agent to allow
the user to choose other directions.

Throughout this specification, the identifier
ConstrainDOMString is used to refer to the (DOMString or sequence<DOMString> or
ConstrainDOMStringParameters) type.

Capabilities

Capabilities is a dictionary containing one or more
key-value pairs, where each key MUST be a constrainable property, and
each value MUST be a subset of the set of values allowed for that
property. The exact syntax of the value expression depends on the type of
the property. The Capabilities dictionary specifies which constrainable
properties that can be applied, as constraints, to the constrainable
object. Note that the Capabilities of a constrainable object
MAY be a subset of the properties defined in the Web platform, with a
subset of the set values for those properties. Note that Capabilities are
returned from the User Agent to the application, and cannot be specified
by the application. However, the application can control the Settings
that the User Agent chooses for constrainable properties by means of
Constraints.

An example of a Capabilities dictionary is shown below. In this case,
the constrainable object is a video source with a very limited set
of Capabilities.

{
frameRate: {min: 1.0, max: 60.0},
facingMode: ['user', 'left']
}

The next example below points out that capabilities for range values
provide ranges for individual constrainable properties, not combinations.
This is particularly relevant for video width and height, since the
ranges for width and height are reported separately. In the example, if
the constrainable object can only provide 640x480 and 800x600
resolutions the relevant capabilities returned would be:

Note in the example above that the aspectRatio would make clear that
arbitrary combination of widths and heights are not possible, although it
would still suggest that more than two resolutions were available.

A
specification using the Constrainable Pattern should not subclass the
below dictionary, but instead provide its own definition. See
MediaTrackCapabilities for an example.

dictionary Capabilities {
};

Settings

Settings is a dictionary containing one or more key-value
pairs. It MUST contain each key returned in
getCapabilities() for which the property is defined on the
object type it's returned on; for instance, an audio
MediaStreamTrack has no "width" property. There MUST
be a single value for each key and the value MUST be a member of the set
defined for that property by getCapabilities(). The
Settings dictionary contains the actual values that the User
Agent has chosen for the object's constrainable properties. The exact
syntax of the value depends on the type of the property.

A conforming User Agent MUST support all the constrainable properties
defined in this specification.

An example of a Settings dictionary is shown below. This example is
not very realistic in that a browser would actually be required to
support more constrainable properties than just these.

{
frameRate: 30.0,
facingMode: 'user'
}

A specification using the Constrainable Pattern should not subclass
the below dictionary, but instead provide its own definition. See
MediaTrackSettings for an example.

dictionary Settings {
};

Constraints and ConstraintSet

Due to the limitations of WebIDL, interfaces implementing the
Constrainable Pattern cannot simply subclass Constraints and
ConstraintSet as they are defined here. Instead they must provide their
own definitions that follow this pattern. See MediaTrackConstraints for an example of
this.

dictionary ConstraintSet {
};

Each member of a ConstraintSet corresponds to a
constrainable property and specifies a subset of the property's legal
Capability values. Applying a ConstraintSet instructs the User Agent to
restrict the settings of the corresponding constrainable properties to
the specified values or ranges of values. A given property MAY occur both
in the basic Constraints set and in the advanced ConstraintSets list, and
MAY occur at most once in each ConstraintSet in the advanced list.

Dictionary Constraints Members

This is the list of ConstraintSets that the User Agent MUST
attempt to satisfy, in order, skipping only those that cannot be
satisfied. The order of these ConstraintSets is significant. In
particular, when they are passed as an argument to
applyConstraints, the User Agent MUST try to satisfy
them in the order that is specified. Thus if advanced
ConstraintSets C1 and C2 can be satisfied individually, but not
together, then whichever of C1 and C2 is first in this list will
be satisfied, and the other will not. The User Agent MUST attempt
to satisfy all ConstraintSets in the list, even if some cannot be
satisfied. Thus, in the preceding example, if constraint C3 is
specified after C1 and C2, the User Agent will attempt to satisfy
C3 even though C2 cannot be satisfied. Note that a given property
name may occur only once in each ConstraintSet but may occur in
more than one ConstraintSet.

Examples

This sample code exposes a button. When clicked, the button is
disabled and the user is prompted to offer a stream. The user can cause
the button to be re-enabled by providing a stream (e.g., giving the page
access to the local camera) and then disabling the stream (e.g., revoking
that access).

Privacy Indicator Requirements

Define any<kind>Accessible (e.g.
anyAudioAccessible, anyVideoAccessible) as the
logical OR of the [[\kindsAccessibleMap]][kind] value and all
the [[\devicesAccessibleMap]][deviceId] values for devices of
that kind.

Define any<kind>Live (e.g. anyAudioLive,
anyVideoLive) to be the logical OR of the
[[\kindsLiveMap]][kind] value and all the
[[\devicesLiveMap]][deviceId] values for devices of that
kind.

Define anyAccessible to be the logical OR of all
any<kind>Accessible values.

Define anyLive to be the logical OR of all
any<kind>Live values.

Then the following are requirements on the User Agent:

The User Agent MUST indicate to the user when the value of
anyAccessible changes.

The User Agent MUST indicate to the user when the value of
anyLive changes.

If the User Agent provides indication to the user per
kind, then for each any<kind>Accessible value
and any<kind>Live value, it MUST at minimum indicate
when the value changes.

If the User Agent provides indication to the user per
device, then for each
[[\devicesAccessibleMap]][deviceId] value and
[[\devicesLiveMap]][deviceId] value, it MUST at minimum
indicate when the value changes.

Any false-to-true transition indicated MUST remain observable for a
sufficient time that a reasonably-observant user could become aware of
it.

Any of the above transition indications MAY be combined as long as
the combined indication cannot transition to false if any of its
component indications are still true.

and the following are encouraged behaviors for the User Agent:

The User Agent is encouraged to provide ongoing indication of the
current state of anyAccessible.

The User Agent is encouraged to provide ongoing indication of the
current state of anyLive and to make any generic hardware
device indicator light match.

If the User Agent provides indication to the user per
kind, then for each any<kind>Accessible value
and any<kind>Live value, it is encouraged to provide
ongoing indication of the current state of the value. It is also
encouraged to make any device-type-specific hardware indicator light
match the corresponding any<kind>Live value.

If the User Agent provides indication to the user per
device, then for each
[[\devicesAccessibleMap]][deviceId] value and
[[\devicesLiveMap]][deviceId] value, it is encouraged to
provide ongoing indication of the current state of the value. It is also
encouraged to make any device-specific hardware indicator light match the
corresponding [[\devicesLiveMap]][deviceId] value.

Any of the above ongoing indications MAY be used instead of the
corresponding required transition indication provided the false-to-true
transition requirement is met.

Privacy and Security Considerations

This section is non-normative; it specifies no new behavior, but instead
summarizes information already present in other parts of the
specification.

This document extends the Web platform with the ability to manage input
devices for media - in this iteration, microphones, and cameras. It also
allows the manipulation of audio output devices (speakers and headphones).
Capturing audio and video exposes personally-identifiable information to
applications, and this specification requires obtaining explicit user
consent before sharing it.

Without authorization (to the "drive-by web"), it offers the ability to
tell how many devices there are of each class, and how they are grouped
together (e.g. a microphone and camera belonging to a single Web cam). The
identifiers for the devices are designed to not be useful for a fingerprint
that can track the user between origins, but the number and grouping of
devices adds to the fingerprint surface. It recommends to treat the
per-origin persistent identifier deviceId as other persistent
storage (e.g. cookies) are treated.

When authorization is given, this document describes how to get access
to, and use, media data from the devices mentioned. This data may be
sensitive; advice is given that indicators should be supplied to indicate
that devices are in use, but both the nature of authorization and the
indicators of in-use devices are platform decisions.

Authorization may be given on a case-by-case basis, or be persistent. In
the case of a case-by-case authorization, it is important that the user be
able to say "no" in a way that prevents the UI from blocking user
interaction until permission is given - either by offering a way to say a
"persistent NO" or by not using a modal permissions dialog.

When authorization to any media device is given, application developers
gain access to the labels of all available media capture devices. In most
cases, the labels are persistent across browsing sessions and across
origins that have also been granted authorization, and thus potentially
provide a way to track a given device across time and origins.

For origins to which permission has been granted, the
devicechange event will be emitted across browsing
contexts and origins each time a new media device is added or removed; user
agents can mitigate the risk of correlation of browsing activity across
origins by fuzzing the timing of these events.

Once a developer gains access to a media stream from a capture device,
the developer also gains access to detailed information about the device,
including its range of operating capabilities (e.g. available resolutions
for a camera). These operating capabilities are for the most part
persistent across browsing sessions and origins, and thus provide a way to
track a given device across time and origins.

Once access to a video stream from a capture device is obtained, that
stream can most likely be used to fingerprint uniquely the said device
(e.g. via dead pixel detection). Similarly, once access to an audio stream
is obtained, that stream can most likely be used to fingerprint user
location down to the level of a room or even simultaneous occupation of a
room by disparate users (e.g. via analysis of ambient audio or of unique
audio purposely played out of the device speaker). User-level mitigation
for both audio and video consists of covering up the camera and/or
microphone or revoking permission via browser chrome controls.

It is possible to use constraints so that the failure of a getUserMedia
call will return information about devices on the system without prompting
the user, which increases the surface available for fingerprinting. The
User Agent should consider limiting the rate at which failed getUserMedia
calls are allowed in order to limit this additional surface.

In the case of persistent authorization via a stored permission, it is
important that it is easy to find the list of granted permissions and
revoke permissions that the user wishes to revoke.

Once permission has been granted, the User Agent should make two things
readily apparent to the user:

That the page has access to the devices for which permission is
given

Whether or not any of the devices are presently recording ("on air")
indicator

Developers of sites with stored permissions should be careful that
these permissions not be abused. These permissions can be revoked using
the [[permissions]] API.

In particular, they should not make it possible to automatically send
audio or video streams from authorized media devices to an end point that
a third party can select.

Indeed, if a site offered URLs such as
https://webrtc.example.org/?call=user that would
automatically set up calls and transmit audio/video to
user, it would be open for instance to the
following abuse:

Users who have granted stored permissions to
https://webrtc.example.org/ could be tricked to send their
audio/video streams to an attacker EvilSpy by following a
link or being redirected to
https://webrtc.example.org/?user=EvilSpy.

Although [[!RTCWEB-SECURITY-ARCH]] Section 5.2 indicates that
implementations may refuse all access permissions for HTTP origins, it
recommends that implementations allow one-time camera/microphone access.
While allowing one-time access for HTTP origins is convenient, this makes
it possible for an attacker to obtain access to the camera/microphone of
an unsuspecting user.

Extensibility

Although new versions of this specification may be produced in the
future, it is also expected that other standards will need to define new
capabilities that build upon those in this specification. The purpose of
this section is to provide guidance to creators of such extensions.

Any WebIDL-defined interfaces, methods, or attributes in the
specification may be extended. Two likely extension points are defining a
new media type and defining a new constrainable property.

Defining a new media type (beyond the existing Audio and Video
types)

At a minimum, defining a new media type would require

adding a new getXXXXTracks() method for the type to the
MediaStream interface,

describing what a muted or disabled track of that type will render
(see ),

adding the new type as an additional legal value for the
kind attribute on
the MediaStreamTrack interface,

defining any constrainable properties (see ) that are applicable to the media
type,

updating how the HTMLMediaElement works with a MediaStream containing a track of the new media type
(see ),

describing any new security and/or privacy considerations (see
) introduced by the
new type, and

if the new type requires user authorization, defining new
permissions for it, including a new PermissionDescriptor name
associated with the new kind, and defining how these
permissions, along with access starting and ending, as well as
muting/disabling, affect any new and/or existing "on-air" and "device
accessible" indicator states (see ).

Creators of extension specifications are strongly encouraged to notify
the Media Capture Task Force of their extension by emailing the list at
public-media-capture@w3.org.
Future versions of this specification and others created by the Media
Capture Task Force will take into consideration all extensions they are
aware of in an attempt to reduce potential usage conflicts.

It is also likely that new consumers of MediaStreams
or MediaStreamTracks will be defined in the future. The
following section provides guidance.

December 12, 2012

November 15 2012

Introduced new representation of tracks in a stream (removed
MediaStreamTrackList).

Updated MediaStreamTrack.readyState to use an enum type (instad of
unsigned short constants).

Renamed MediaStream.label to MediaStream.id (the definition needs
some more work).

October 1 2012

Limited the track kind values to "audio" and "video" only (could
previously be user defined as well).

Made MediaStream extend EventTarget.

Simplified the MediaStream constructor.

June 23 2012

Rename title to "Media Capture and Streams".

Update document to comply with HTML5.

Update image describing a MediaStream.

Add known issues and various other editorial changes.

June 22 2012

Update wording for constraints algorithm.

June 19 2012

Added "Media Streams as Media Elements section".

June 12 2012

Switch to respec v3.

June 5 2012

Added non-normative section "Implementation Suggestions".

Removed stray whitespace.

June 1 2012

Added media constraint algorithm.

Apr 23 2012

Remove MediaStreamRecorder.

Apr 20 2012

Add definitions of MediaStreams and related objects.

Dec 21 2011

Changed to make wanted media opt in (rather than opt out). Minor
edits.

Nov 29 2011

Changed examples to use MediaStreamOptions objects rather than
strings. Minor edits.

Nov 15 2011

Removed MediaStream stuff. Refers to webrtc 1.0 spec for that part
instead.

Nov 9 2011

Created first version by copying the webrtc spec and ripping out
stuff. Put it on github.

Acknowledgements

The editors wish to thank the Working Group chairs and Team Contact,
Harald Alvestrand, Stefan Håkansson, Erik Lagerway and Dominique
Hazaël-Massieux, for their support. Substantial text in this specification
was provided by many people including
Jim Barnett, Harald Alvestrand, Travis
Leithead, Josh Soref, Martin Thomson,
Jan-Ivar Bruaroey, Peter Thatcher,
Dominique Hazaël-Massieux, and Stefan
Håkansson. Dan Burnett would like to acknowledge the significant support
received from Voxeo and Aspect during the development of this
specification.