Rationale

There are two proposals for adding a media transcript feature to HTML: this proposal (henceforth "our proposal", "this proposal", or "the IDREF proposal") and the Introduction of a @transcript=URL attribute proposal (henceforth "the URL proposal"). As these proposals are quite similar (and are based on the same research) I have separated out rationale which is equally applicable to both proposals, and have separately provided rationale for the differences that remain between the two proposals.

Rationale common to both proposals

When transcripts of media files are available, they are useful to all users. Users of Assistive Technology (AT) obviously benefit from transcripts, but transcripts are also useful to other users.

Consider a video of a college lecture. Students can save time by reading the transcript instead of watching the video. It's also much easier for students to search or to skim for specific content in the transcript than to do so with the media file itself. Given this, it is important for transcript links to be readily exposed to all users.

Transcripts need to be programattically associated with media elements in order for a UA to expose the presence of the transcript in its media controls, in a context menu, or in some other way, and also so that AT can expose the transcript to its users.

Both proposals aim to address two use cases:

UC1 linked transcripts: full text transcripts are provided with the media resource in separate but linked resources.

UC2 same-document transcripts: full text transcripts are provided as text on the same page of the media resource.

Design of a transcript attribute which takes multiple IDREFs

We can associate the media element with visible transcripts (or links to them) somewhere else in the document. To do this, we would add a transcript="" attribute to the media elements which would take a space-separated set of IDREFs. For each such IDREF, if the ID is that of an <a>, <area>, or <iframe> element, the document pointed to by the href="" (in the <a> and <area> cases) or src="" (in the <iframe> case) attribute is taken to be the transcript of the media. If the element with the given ID is not an <a>, <area>, or <iframe> element, the element itself is taken to be the media's transcript.

This design fulfills the basic need for programattic association of transcripts with media elements, and it's possible to link to same-document transcripts as well as external resources.

This technique is fairly straightforward to author; it is no harder than the existing <label for>/<input id> pattern. This technique closely matches existing content which contains transcript links, so it's exceptionally easy to update existing content which publishes transcripts to use this markup pattern.

The simplest way to ensure that the transcript link is readily exposed to all users (including users of older UAs and ATs) is to encourage or even mandate that authors include this link directly in the visible text of the document, or directly as (part of) the constituent text of the document. Relying on UAs to expose transcript links in a context menu could be problematic on touch devices (which lack context menus). Relying on UAs to expose such links in their default video controls means that users suffer when Web site authors use custom video controls and fail to expose the transcript in their custom controls.

Comparing the two proposals

In the (withdrawn) Introduction of a transcript element Change Proposal, ten requirements for transcripts were defined. Let's examine the merits of each requirement and how the mechanism in the two remaining Change Proposals fare.

R1 Discoverability

This is a requirement that transcripts be both human-discoverable and machine-discoverable. Our mechanism fulfills this requirement. In the URL proposal, authors may use a direct URL in the transcript="" attribute. If authors do this, sighted users will not be able to discover the transcript in several circumstances:

in existing User Agents (which do not implement a transcript mechanism),

in future User Agents which expose transcript="" to AT but do not provide transcript access in their default media controls,

and in future User Agents which expose transcript="" to AT and expose transcript access in their default media controls, on sites which use custom media controls that do not provide transcript access.

in existing User Agents (which do not implement a transcript mechanism), the user can get at the transcript by simply clicking on the <a> element.

in future User Agents which expose transcript="" to AT but do not provide transcript access in their default media controls, both AT and non-AT users can get at the transcript by simply clicking on the <a> element.

and in future User Agents which expose transcript="" to AT and expose transcript access in their default media controls, on sites which use custom media controls that do not provide transcript access, both AT and non-AT users can get at the transcript by simply clicking on the <a> element.

Now consider the same case, but with the markup advocated by the URL proposal:

<video src=video.mp4 transcript="foo.html"></video>

in existing User Agents (which do not implement a transcript mechanism), the user can't get to the transcript.

in future User Agents which expose transcript="" to AT but do not provide transcript access in their default media controls, the non-AT user can't get to the transcript.

and in future User Agents which expose transcript="" to AT and expose transcript access in their default media controls, on sites which use custom media controls that do not provide transcript access, the non-AT user can't get to the transcript.

Summary: the IDREF proposal fulfills R1 better than the URL proposal.

R2 Choice to consume

This requires that users have the ability to control whether or not they consume a transcript. Both proposals fulfill this requirement.

R3 Rich text transcripts

This is a requirement that transcripts may be expressed in various rich text formats (such as HTML), and not just in plain text. Both proposals fulfill this requirement.

R4 Design aesthetics

This has two sub-requirements: one, that how transcripts are displayed be styleable by authors, and two, that it must be possible to expose transcripts in custom video controls.

Our proposal encourages transcript links to be visible; authors are familiar with their ability to style visible page content.

The URL proposal encourages transcript links to be directly present in the transcript="" attribute. While it is posisble to present such attribute content to page readers (using JavaScript or a combination of the CSS attr() function, content property, and ::after pseudo-element), it is far more difficult to do so.

This is already visible on the page, and the designer can style it however they'd like.

Now consider the same case, but with the markup advocated by the URL proposal:

<video src=video.mp4 transcript="foo.html"></video>

In order to make this transcript visible, the Web author would have to do something like this in a stylesheet:

video[transcript]::after {
content: attr(transcript);
}

The bare URL is now visible to the user, but it lacks descriptive text and isn't even clickable.

Both proposals make it possible to expose transcripts within custom video controls. However, in the URL proposal, authors are encouraged to directly include a link in the transcript="" attribute. Bare URLs lack descriptive titles and language metadata; custom controls exposing such transcripts would lack any way for the user to know which one to choose (see also R9).

For instance, consider the simple case of two external transcripts, one in English and one in German:

A User Agent could directly expose both transcripts in its media controls, and indicate to the user (in their own language) what language each transcript is available in. From such controls, the user could directly navigate to the desired transcript.

Now consider the same case, but with the markup advocated by the URL proposal:

The User Agent doesn't know that there are multiple transcripts, nor does it know what languages they're in. It can only offer the equivalent of a button that navigates the user to the #transcripts element, at which point the user can click on the appropriate link.

Summary: the IDREF proposal fulfills R1 better than the URL proposal.

R5 Embeddable

This requires that it be possible for transcripts to be expressed as an external document, while also embedded into the document which contains the media element. Both proposals fulfill this requirement (with <iframe>).

R6 Fullscreen support

This requires that it be possible for transcripts to "go fullscreen with the media element." To the extent that I can make sense of this requirement, both mechanisms fulfill it. That is, there is nothing in either mechanism that forbids or prevents this.

R7 Retrofitting

As noted in the URL proposal, the vast majority of existing pages which publish transcripts for media elements show a visible transcript on the same page as the media player. It should be as easy as possible to alter such pages to programmatically associate the visible transcript with the media element. This requirement was a key motivation of the design of the mechanism advocated in this Change Proposal.

Our mechanism readily exposes transcript links to users, which helps it work well in UAs that do not support the <video> element, and also in UAs that support <video> but not transcript="".

The URL proposal less closely matches existing author behavior, so increases the authorial effort required to retrofit existing pages. While it's possible in the URL proposal to link to in-page transcripts, because absolute URLs are allowed in its transcript="" attribute, authors are much more likely to simply directly link to the transcript, thus either duplicating the link (see R8) or failing to provide the transcript to users of older browsers (see R1).

Summary: the IDREF proposal fulfills R7 better than the URL proposal.

R8 No link duplication

Our mechanism fulfills this requirement. In fact, this requirement was a key motivation of the design of the mechanism advocated in this Change Proposal.

In the URL proposal, authors who wish to provide a visible transcript link will most likely duplicate the link. We should avoid duplicating the link to the transcript, because such duplicated data tends to bit-rot, thus harming accessibility. [Çelik, Doctorow]

The link to foo.html has now been duplicated. Should the location of the transcript change, it is more likely that these two links will become out of sync with one another, and AT users will suffer for it.

Summary: the IDREF proposal fulfills R8 better than the URL proposal.

R9 Multiple transcripts

Transcripts may be available in several languages; the mechanism we come up with should straightforwardly allow authors to link to multiple transcripts. Our mechanism fulfills this requirement. In fact, this requirement was a key motivation of the design of the mechanism advocated in this Change Proposal.

You can link to many transcripts, and you can use the existing hreflang="" attribute to hint to the UA about the language that each transcript is in. Because the association is from the media element to the transcript elements, it's especially easy for UAs to find all of the media element's transcripts (without having to process the entire DOM).

The URL proposal does not allow for the programmatic association of multiple transcripts. Even if it were altered to do so, its mechansim fails to provide language, title, or other such metadata for each transcript. This harms the user's ability to choose the correct transcript, and the UA or page author's ability to expose multiple transcripts in custom or built-in video controls.

For instance, consider the simple case of two external transcripts, one in English and one in German:

A User Agent could directly expose both transcripts in its media controls. AT could directly expose both transcripts to AT users. In both cases, we can indicate to the user (in their own language) what language each transcript is available in. From such a menu, the user could directly navigate to the desired transcript.

Now consider the same case, but with the markup advocated by the URL proposal:

The User Agent doesn't know that there are multiple transcripts, nor does it know what languages they're in. It can only offer the equivalent of a button that navigates the user to the #transcripts element, at which point the user can click on the appropriate link.

Summary: the IDREF proposal fulfills R9; The URL proposal does not.

R10 Stand alone transcripts

Our mechanism fulfills this requirement. Which is to say, it is possible for UAs to render transcript documents which are not programmatically associated with media elements.

In the URL proposal, authors may use a direct URL in the transcript="" attribute. If authors do this, the transcript link will not be available in browsers that do not support or do not render audio or video elements.

Surviving cross-document copy-and-paste operations

The programmatic association of the <video> with its transcripts might not be maintained through a cross-document copy-and-paste operation, though this is primarily a function of the distance in the DOM between the media element and the element representing the transcript, and not the actual form of programmatic association.

To completely avoid the copy/paste problem, the elements pointed to by transcript="" could be contained within the media element's subtree. Such content will be displayed in browsers which do not support HTML5 media elements, thus serving users of such browsers.

Sites which provide for the embedding of media often proivde a <textarea> for easily copying their embed markup. Such sites can include markup using whatever mechanism we decide on, thus reducing the impact of the copy-paste problem even further.

Summary: the URL proposal handles the copy-paste scenario somewhat better than the IDREF proposal, but this is not a serious problem in practice.

Conclusion

Details

N.B. The spec changes described below are intended to fully describe the sorts of changes necessary, but the exact form of the changes to be made are left to the discretion of the editor(s). (This is not a diff that can be blindly applied to the specification. Should the editor(s) find this description difficult to apply unambiguously, the author of this Change Proposal volunteers to work with them and the WG to resolve any such ambiguities identified.)

New section on transcripts

Add a section defining the transcript="" attribute.

Transcripts for media elements may be provided, either directly in the text of the page, indirectly by linking to an external document with an <a> or <area> element, or by transclusion with an <iframe> element. To programmatically associate such a transcript with a media element, a transcript="" attribute on the media element may be used.

The media element can be associated with zero or more transcripts, known as the media element's transcripts, by using the transcript attribute.

Except where otherwise specified by the following rules, a media element has no transcript.

The transcript attribute may be specified to indicate a transcript with which the media element is to be associated. If the attribute is specified, the attribute's value, when split on spaces, must be a list of IDs of elements in the same Document as the media element. If the attribute is specified and there is an element in the Document whose ID is equal to one of the entries in the transcript attribute, then that element is one of the media element's transcripts.

Modifications to the-video-element and the-audio-element

In #the-video-element, update the note beginning with the sentence "In particular, this content is not intended to address accessibility concerns". Specifically, change the sentence

For users who would rather not use a media element at all, transcripts or other textual alternatives can be provided by simply linking to them in the prose near the video element.

to reference this new mechanism.

Make a similar edit to the same note in #the-audio-element.

Impact

Positive Effects

By programattically associating transcripts with media elements, we enable users, both assistive technology users and otherwise, to more easily access transcripts.

It's easy to update existing content to use this markup pattern, so it's easy for authors to adopt this technique.

We avoid duplicating the link to the transcript, thus preventing the link presented to AT users to fall out-of-sync with the link presented to others.

You can link to many transcripts, and you can use the existing hreflang="" attribute to hint to the UA about the language that each transcript is in.

It's possible to link to same-document transcripts as well as external resources.

It degrades well in UAs that don't support the <video> or <audio> elements, as well as in UAs that support <video> and <audio>, but have not yet been updated to support programmatically associated transcripts.

Negative Effects

It's more difficult to programmatically associate a transcript link than it is to simply include the link in prose near a media element. Therefore it's reasonable to expect content authors to not bother with the programmatic association. (This is true for all methods of programmatically associating a transcript with a media element.)

Conformance Classes Changes

The transcript="" attribute is allowed on <audio> and <video> elements.

Risks

UAs might not implement this mechanism, thus causing us to drop it from the specification in due course.