Contents

Background

There is both a container and a set of codecs involved for each video format. Typically there's a video codec and an audio codec. The captioning format can be thought of as a timed text codec.

For a given container, you need to have a defined (and supported by tools and other video players for network effects) mapping for muxing a data stream for a given codec into the container. Typically, such mapping don't exist for all codec/container combinations, and there are established combinations that work.

The format choice is driven by the video codec, so the container is then the container typically paired with the chosen video codec.

The choice of captioning format then depends on what's conventional for the container.

In theory, given a muxing rule, you can put any video codec and any captiong format in any container, but in practice, video codec tends to have a conventional native container, so the video codec dictates the container and then different containers have different conventional timed text formats and the timed text formats might not have muxing rules for non-native containers.

Gecko will embed an Ogg-specific playback framework called liboggplay. It only supports the Ogg container format.

Typically, desktop environments come with a more general timed media playback framework. These frameworks can load extension libraries that enable support for various containers and codecs.

Desktop

Framework

Windows

DirectShow

Mac OS X

QuickTime

Gnome

GStreamer

KDE

Phonon

Example: Ogg and MP4 are containers, whereas Theora and H.264 are codecs. Gstreamer and QuickTime are both timed media frameworks, which each can play various container/codec combinations. Ogg, Theora and CMML are a natural match. MP4, H.264 and 3GPP TT are a natural match. While technically, you 'could' define a way to put 3GPP TT inside Ogg, the disadvantage to doing this is that the result might not interoperate well with authoring tools and other player due to the combination being unusual.

Container

Codecs

Authoring tools

Natural captioning format

Ogg

Theora (video)

Vorbis (audio)

CMML

MP4

H.264

3GPP Timed Text

.flv

WMV

WMV9/VC-1

Note: Subrip is external to the video container and can be used with any format. The main known disadvantage of this is blah, blah. It would make sense to use this if blah.

Work plan for Captioning

Determine which captioning format should be supported in Mozilla for the natively-supported Ogg video. This needs to take into account the extremely complex map of video formats and players today (see above).

Determine which subset of that format is the most crucial. This can save the Mozilla developers a good deal of work, because captioning formats are complex. Some of the complexity is necessary and some is not necessary for Mozilla suppoort

Work with HTML 5, web browser development and captioning communities to ensure that the solution will be accepted. We don't want different solutions in each browser. That would either mean one browser would need to redo their work, or that caption developers would have to deal with incompatible solutions in different browsers.

Explore the need to support the following features and ensure support when found necessary:

social caption creation (This poses very different requirements than the idea of making video files intrinsically accessible. Hsivonen 09:06, 4 August 2008 (UTC)) aaronlev Henri also mentioned that potential legal issues could affect technical issues, but we aren't sure. It would be good if WGBH had some background to help understand this as well while devising a captioning solution.

metadata indicating changes in captioning language for search and Braille. (Google seems to be doing better by ignoring author-entered language metadata. Is rendering foreign words into Braille strong enough an use case to justify the complexity of supporting this and authoring with this data. Hsivonen 09:06, 4 August 2008 (UTC))

semantics and style, etc. (This seems like a pretty big departure from baseline established by TV captioning. Hsivonen 09:06, 4 August 2008 (UTC)) aaronlev I'm not sure -- there are some higher level things such as embedding of a musical note graphic to indicate music. I believe that captioning is moving toward expressing more complex background information.

Ensure captioning solution is compatible with current authoring and if possible, video conversion tools, so that current and future content can easily use the solution

Ensure the solution is compatible with both existing media repurposed for the Internet (i.e., originating in broadcast and cable TV environments and physical media like DVDs and theatrical motion pictures) and media originally developed for Internet distribution, including user-generated content.

Ensure the solition incorporates the expressed needs and preferences of Internet-based media users with sensory disabilities. (( Aaron: what are these? Can we express these up front? WGBH must already know this info, e.g. why would it change based on internet vs. brodcast? ))

Ensure all solutions, documentation and tests developed are friendly to open source contributors and clean of known IP conflicts

Participate in the relevant deliberations, meetings, standards development activities and proposed work products of HTML 5 WG.

Determine what can be done about supporting captioning when an external back end (gstreamer, QuickTime, DirectShow) is in use. (MP4 containers are the most likely external back end case, so 3GPP Timed Text is a potential format candidate in that case.)

Build a complete set of open-licensed documentation and test cases for developers and content creators. In general, reach out to developers implementing captioning solutions for the web and assure that issues of captioning (for deaf and hard-of-hearing people) and description (for blind and visually impaired people) are taken into account and are well-understood.

Test solutions and file bugs in databases for each browser to drive the necessary work. Attach relevant test cases and documentation. Make sure the developers know what to fix.

Work Plan for Audio Description

TBD

Success Criteria

A complete set of documentation and test cases for captioning and audio descriotion, without unnecessary IP restrictions, is available

Mozilla and should implement the proposed solution for both captioning and audio description, in a manner which maximizes usability. For example, there should be a consistent UI for turning captions on or off, no matter what the video format being used is.

Authoring tools are available which support the solutions

At least one mainstream source of video content on the web (e.g. wikimedia) has some content which supports the proposed solution for captioning and audio description