Rationale

Issue-194 calls for "a mechanism for associating a full transcript with an audio or video element".

We have analysed a large number of different ways in which a transcript can be programmatically associated with a media element, see ISSUE-194/Research. There are many tradeoffs between each design.

Some of the most fundamental unresolved issues about picking a solution are as follows:

Problem 1: single or multiple links

It is unclear as yet whether we need a solution with a single link or with multiple links.

A single link makes it possible to easily expose it to AT (accessibility technology such as a screenreader). For example, a screen reader could announce "Transcript available, hit CTRL+ENTER to view" if there is a single link. Such an announcement is almost impossible with multiple links.

A single link can also easily be included with a visual indicator in the video element. For example, there could be a little icon overlayed on the video in the top left corner which is visible when the video is in focus or moused-over. This could be included into the shadow DOM and thus could be styled by publishers if they disagree with this default rendering. Such a visual indicator is not possible with multiple links - or would require introduction of a menu of links. This would almost certainly require inclusion into the video controls as a menu similar to subtitle tracks.

Finally, the support of multiple links may not be necessary at all, since it is always possible to provide a single link that goes to a HTML page that contains all the alternative transcripts. A nice design of such a page would load the alternative text via JavaScript into that one page rather than linking off into another set of resources, thus essentially providing for a solution with multiple different transcripts behind a single link.

While the authors of this change proposal may not agree on whether multiple links or a single link should be associated with the video, we do agree that the discussion on this topic has not been held yet and we need more time to have this discussion.

Problem 2: difference to longdesc

It is unclear as yet whether there is a fundamental difference between a long description for a video and a transcript.

The use of a long description is as a supplement to a short description for accessibility users. As such, it is meant to provide a complete description of the resource under consideration for the advantage of a vision-impaired user.

The best possible long description that we can provide for a video is a "transcript" - where the transcript is meant to include a textual transcription of all the words being said (i.e. a dialog transcript) and all the locations and action happening (i.e. a scene description).

What this means is that for an accessibility user, the one long text description that is of interest is the transcript. If we had a set of links that programmatically linked different types of transcripts (and other long descriptions) to the video, the only one that the accessibility user should look at is the one that is most inclusive and is thus the transcript. Thus, if there was a way to both link a transcript and a long description to a video, and we had a transcript available, that transcript would be linked in the long description and the transcript link. If we didn't have a transcript, but a different long description document available, the transcript link would be empty and the long description link would have the link. Therefore, the transcript link does not provide any additional information and is therefore redundant.

There is, however, a semantic difference between different types of transcripts and long descriptions and other text documents that are regarded as text alternatives for video. If we have such a set of different documents, they should be exposed to all users underneath the video in a section that should be linked through @aria-describedby. Here, the question is whether that is sufficient or whether we need any additional means for programmatically linking multiple text alternatives to video. Is there indeed a use case for associating semantic labels like "transcript" or "script" or "longdesc" or ... with individual links to related text documents beyond what @aria-describedby and microdata provide?

While the authors of this change proposal may not agree on whether long descriptions and transcripts need to be separately programmatically associated with the video or not, we do agree that the discussion on this topic has not been held yet and we need more time to have this discussion. See the post to the Accessibility Task Force mailing list, which has had barely any replies yet.

In addition, any solution that is provided for the long description problem for images may or may not be appropriate for the use cases required for a transcript. Since a replacement for @longdesc is under discussion for HTML.next, the transcript problem should be resolved in HTML.next, too.

Problem 3: the visual presentation need

It is unclear as yet how a transcript link would be visually exposed in the browser.

This is particularly true for some of the options that were analysed, while others have a clear visual presentation underneath the video yet still ask for visual exposure in the video player (possibly the controls) so the availability of a transcript is discoverable in fullscreen video, too.

HTML5 does not prescribe visual presentation of attributes and elements. However, the lack of a generally accepted way for how to present it visually has been the key cause of the failing of the @longdesc attribute. We do not want to repeat this exercise.

Therefore, unless browsers will take a step towards showing how they will visually present the availability of transcripts and that they are committed to doing so, e.g. by showing experimental branches with such a feature, success of transcript links is questionable.

The authors of this change proposal therefore agree that experiments should be done before any specifications are made in this space.

Response to other Change Proposals

As documented in our research, the design of transcript="" (advocated by the Introduce a new attribute: @transcript Change Proposal) has a number of critical flaws. If we were to adopt a method of programmatically associating media elements with transcripts, several of the other possible designs are far superior (e.g. designs 2B, 3A, and 3B).

Conclusion

Instead of prematurely picking a design, we should instead encourage UAs to build experimental implementations, so that we can learn more about how such a feature would work. Additionally, because media elements allow for the construction of custom controls, it is possible for library authors and other non-UA implementors to experiment with possible implementation strategies. We should also encourage such experimentation.

Since integrating lessons learned from such experimental implementations will take longer than the HTML5 schedule allows, we should defer this feature to HTML.next, so that we take the time to do this right.

[It's worth noting that any existing website using <video> and publishing transcripts is doing so by linking to those transcripts in the prose of the site. Associating the div in which the links are published to the video with @aria-describedby is providing a programmatic association right now - even if possibly insufficient.]

Details

No change.

Impact

We continue having to use URLs near the video element that are not directly programmatically associated with the video element as the way to publish transcripts. The links can be associated with the video and alerted to AT through the use of @aria-describedby. They can be semantically marked up as being of type "transcript" through use of microdata or RDFa.