for off-video display (lyrics), disabled by default in this version, not shown by UA

for metadata (slide timings, annotation data for app-rendered annotations), enabled by default, not shown by UA

Tracks that are for visual display or audio playback have additionally a user-facing label and a language.

Tracks that are for visual display have an additional boolean indicating if they include sound effects and speaker identification (intended for the deaf, hard of hearing, or people with sound muted) or not (i.e. translations intended for people with audio enabled but who cannot understand the language, or karaoke lyrics).

Each timed track associated with a media resource, like the media resource itself, can have multiple sources.

Each source for a timed track has:

URL

type (if there are multiple sources)

media

The media resource can also imply certain timed tracks based on data in the media resource.

Is there a better solution to enabled=false for disabling tracks by default? Do we ever need to disable a track that might be enabled by default?

Visual titles

File format

Based on studying a broad range of Timed track formats, there does not appear to be a format that is easy to read and write, supports automatic positioning to avoid overlapping titles while still supporting some level of positioning control, supports temporally-overlapping titles, uses video-independent positioning instead of pixel-based (for visual) or frame-based (for temporal) positioning, and supports some inline structure for ruby, italics, bold, and karaoke.

The two formats that are the cleanest in terms of existing syntax, that are a subset of the above feature set, and that can be extended relatively cleanly in a backwards-compatible way are the FAB subtitler format and the SRT format. The former, however, lacks much documentation. The latter appears to be more well-known.

CSS extensions

Cues are rendered as block boxes with inline boxes. Cues have a voice (identified by a keyword or a number). Cues can have a part that is before the current time and a part after the current time.

The block box is matched by the pseudo-element ::cue on the media element (<video>).
Only visible cues are matched (those on tracks enabled and shown by the UA whose start/end time range contains the current time).
The ::cue pseudo takes an optional argument that is the voice of cues that it is to match. The keyword "*", matching all voices, is assumed if the argument is absent.

The ::cue pseudo when given _two_ arguments matches all innermost inline boxes in the cue of the element that match its second argument. Its first argument is a voice; the keyword "*" matches all voices. Its second argument is one of "i", "b", "ruby", "rt" (matches inline boxes immediately inside one of those annotations), "before", "after" (fragments before/after the current time).

Constructor for MediaCue: new MediaCue(id, startTime, endTime, settings, text); // settings and text get parsed like the cues in the main format, whatever that ends up being

CueEvent
readonly attribute MediaCue cue;

HTMLTrackElement
readonly attribute MediaTrack track;

Other minor things

We need to make sure that media playback is paused until all enabled timed tracks are locally available.

We need to block cross-origin tracks (eventually blocking only those that aren't CORS-enabled).

Open issues

Synchronised media

For now, sign-language and alternate or additive audio tracks (e.g. audio description tracks) have to be in-band, because UA vendors are refusing to implement synchronisation of external media tracks for now.

However, we should bear it in mind. Adding that kind of thing to the API is going to be non-trivial. The simplest way is probably to just to require that the authors use multiple <video>/<audio> elements and we link them somehow; with one designated as the "sync clock" with them all syncing to it, rather than having each <video> element expose multiple "buffered" "seekable" etc.

Streaming

Do we need to handle live transcription and streaming titles in external files? If so, how?

For now, it's not clear if there are any use case for streaming external timed track resources.

Web based radio might benefit from serving a live audio stream with song title and other details like a artist URL, but it's not clear that this needs to be a timed track (it could be a WebSocket or EventSource feed).

Specification approach

Add <track> element

Add concept of a media element's timed tracks list

Add algorithms to update the timed tracks list (based on <track> elements and based on the media resource)