Timed Text Tracks

In this article

The HTML5 track element enables you to add timed text tracks, such as closed captioning, translations, or text commentary, to HTML5 video elements.

The track element

The track element represents a timed text file to provide users with multiple languages or commentary for videos. You can use multiple tracks, and set one as default to be used when the video starts. The text is displayed in the lower portion of the video player.

Track file formats

Text tracks use a simplified version of the Web Video Text Track (WebVTT) or Timed Text Markup Language (TTML) timed text file formats. Microsoft Edge now supports in-band closed captioning based on the mandate by the FCC. For more information on how to map the in-band 608/708 CC to Text Track cues, see Conversion of 608/708 captions to WebVTT.

The TTML file uses a namespace declaration and the language attribute in the root element (tt). This is followed by the body and a div element. Within the div element are the timing cues. The actual times are set as attributes (begin, end) of the opening paragraph tag (<p>) and the text is delineated by the closing </p> tag. Blank lines and white space are ignored. If there are multiple lines, they are separated by <br/> tags.

WebVTT

WEBVTT
00:00:01.878 --> 00:00:05.334
Good day everyone, my name is John Smith
00:00:08.608 --> 00:00:15.296
This video will teach you how to
build a sand castle on any beach

The file starts with the tag WEBVTT on the first line, followed by a line feed. The timing cues are in the format HH:MM:SS.sss. The Start and End cues are separated by a space, two hyphens and a greater-than sign ( --> ), and another space. The timing cues are on a line by themselves with a line feed. Immediately following the cue is the caption text. Text captions can be one or more lines. The only restriction is that there must be no blank lines between lines of text. The MIME type for WebVTT files is "text/vtt".

The MIME type for WebVTT files is "text/vtt".

Using multiple track files

More than one timed text file can be used — for instance, to provide your users with multiple languages or alternate commentary. If you're using multiple tracks, you set one as default to be used if your page doesn't specify or the user hasn't picked a language. Within the video player, the user can choose alternate tracks through a built-in user interface.

The following example shows a video element with three track elements.

In this example, the source element is used to define the video file, and the track elements each specify a text translation. The track elements are children of the video element. The track element accepts the following attributes.

JavaScript and the track element

Like most elements, the track element can be manipulated through JavaScript. The following objects, methods, and properties are available to manage track content and cues. A track is a collection of cues that provides times and text content related to a video.

textTrack and textTrackList objects

The textTrackList is an object associated with the video element that contains a list of the textTrack objects. To get a list of tracks used with a certain video (if any), the video object provides the textTracks property.

Returns the text tract cue text as a document fragment that consists of HTML elements and other Document Object Model (DOM) nodes.

The textTrackCue object also has two events:

Event

Description

Exit

Fires when a cue is done.

Enter

Fires when a cue is active.

Working with cues

Using the cues property on the track element, you can get an array or list of all the cues on that track. The textTrack.cues property returns an array of textTrackCue objects. The textTrackCue object, or cue, includes an ID, the start and end time, and text.

In contrast to the cues property, which gets all cues associated with a track, the activeCues property gets you just the ones that are currently being displayed. The following example displays the startTime and endTime of the subtitle being displayed.