WebVTT File specifications

WebVTT Format specifications

⇒ File header

⇒ Cue [cue] format

[one or more characters not containing the substring "-->" or \r, \n, \r\n]
[hh...:]mm:ss.msmsms --> [hh...:]mm:ss.msmsms [settings]First line
Second line
...

[hh...:] for hour declarations is optional

[settings] for cue setting declarations are optional

Milliseconds separator is a full stop (.)

Cues have to be separated by one (or more) blank line

Notes, thoughts:

from specs: "A WebVTT timestamp representing the start time offset of the cue. The time represented by this WebVTT timestamp must be greater than or equal to the start time offsets of all previous cues in the file."

thoughts: would be useful to have timestamps with same start time offset than previous cues in a file, e.g. we have two speakers and want to add different settings (see Cue settings part below) to them

Example:

WEBVTT

1
00:00:15.000 --> 00:00:18.000
At the left we can see...

2
00:00:18.167 --> 00:00:20.083
At the right we can see the...

3
00:00:20.083 --> 00:00:22.000
...the head-snarlers

⇒ Cue settings [settings]

Settings are added right after the timing, on the same line, separated with one (or more) space or tabulation

Settings can be combined

Example:

WEBVTT

1
00:00:15.000 --> 00:00:18.000 A:start L:10
At the left we can see...

2
00:00:18.167 --> 00:00:20.083 A:end S:75%
At the right we can see the...

this document differs from specs in that way that [text track cue] is as width (for horizontal, height for vertical) as the widest (for horizontal, highest for vertical) [text track cue line] within

the following settings should be implemented as default [text track cue] settings so they can be omitted from [cue]:

horizontal: by default a [text track cue] is positioned at the bottom center of the [video viewport] with setting L:100%T:50%; text within a [text track cue] is aligned to the center with setting A:middle

vertical: by default a [text track cue] is positioned at the top right (for dir="ltr", bottom right for dir="rtl") of the [video viewport] with setting L:0%T:0%; text within a [text track cue] is vertical aligned to the top (for dir="ltr", bottom for dir="rtl") with setting A:start

style sheet:

from specs: "No style sheets are associated with nodes. (The nodes are subsequently restyled using style sheets after their boxes are generated, ...)"

thoughts: most developers will provide a style sheet for the subtitles/captions. If no cue setting provided, first we should follow the default settings mentioned in the clarification hints above which should be able to be overwritten by a developers style sheet. If cue settings provided no styles sheets made by developer should be used.

for cues positioned at the bottom the players control bar will be shown above them

not yet clear what to do if width and/or height of [text track cue] box exceeds the [video viewport]

1.1) Text alignment: A:[start|middle|end]

Hints:

where [start|middle|end] means:

if direction is "LTR": start ≘ left; end ≘ right

if direction is "RTL": start ≘ right; end ≘ left

Text alignment

A:start

A:middle

A:end

cue example

WEBVTT

1
00:00:15.000 --> 00:00:18.000 A:start
Hello
everbody

WEBVTT

1
00:00:15.000 --> 00:00:18.000 A:middle
Hello
everbody

WEBVTT

1
00:00:15.000 --> 00:00:18.000 A:end
Hello
everbody

Notes, thoughts:

A:[start|middle|end] only for aligning the [text track cue line] blocks within the [text track cue]

1.2) Text position: T:[number]%

Hints:

where [number] is a positive integer

Text position

T:0 ≘ T:0%

T:50% ≘ A:middle (see above)

T:100%

cue example

WEBVTT

1
00:00:15.000 --> 00:00:18.000 T:0%
Hello
everbody

WEBVTT

1
00:00:15.000 --> 00:00:18.000 T:50%
Hello
everbody

WEBVTT

1
00:00:15.000 --> 00:00:18.000 T:100%
Hello
everbody

Text alignmentandText position

A:startT:50%

A:middleT:50% ≘ T:50% (see above)

A:endT:50%

cue example

WEBVTT

1
00:00:15.000 --> 00:00:18.000 A:startT:50%
Hello
everbody

WEBVTT

1
00:00:15.000 --> 00:00:18.000 A:middleT:50%
Hello
everbody

WEBVTT

1
00:00:15.000 --> 00:00:18.000 A:endT:50%
Hello
everbody

Notes, thoughts:

for block positioning the (upcoming) CSS3 Images property "object-position" could be very useful here if browsers would support it already

1.3) Line position: L:[number]%

Hints:

where [number] is a positive integer with "%"(percentage) present OR [number] is a positive or negative integer and "%" (percentage) not present

L:[number]% represents a specific position of [text track cue] box relative to the bottom of the [video viewport]

L:[number] represents a line number

Line position

L:0%

L:50%

L:100% ≘ A:middleT:50% (see above)

cue example

WEBVTT

1
00:00:15.000 --> 00:00:18.000 L:0%
Hello
everbody

WEBVTT

1
00:00:15.000 --> 00:00:18.000 L:50%
Hello
everbody

WEBVTT

1
00:00:15.000 --> 00:00:18.000 L:100%
Hello
everbody

Line position

A:startT:0%L:100%

A:middleT:50%L:50% ≘ L:50% (see above)

A:endT:100%L:0%

cue example

WEBVTT

1
00:00:15.000 --> 00:00:18.000 A:startT:0%L:100%
Hello
everbody

WEBVTT

1
00:00:15.000 --> 00:00:18.000 A:middleT:50%L:50%
Hello
everbody

WEBVTT

1
00:00:15.000 --> 00:00:18.000 A:endT:100%L:0%
Hello
everbody

Notes, thoughts:

for block positioning the (upcoming) CSS3 Images property "object-position" could be very useful here if browsers would support it already

1.4) Cue size: S:[number]%

Hints:

where [number] is in the range 0 ≤ number ≤ 100

S:[number]% represents a percentage [text track cue] size decrease

Notes, thoughts:

default size of [text track cue] box is 100%

size value does not change the text size but the width (when horizontal, height when vertical) of the [text track cue] box

1.5) Vertical alignment: D:verticalORD:vertical-lr

Hints:

D:vertical represents a vertical aligned text where text is growing right to left

D:vertical-lr represents a vertical aligned text where text is growing left to right

Default cue setting for text position is A:middle and for line position is T:50% (if left/not added as cue setting)