As requested, here are some comments on the Media Accessibility
Requirements document, for those working on it. Most are minor editorial
issues but there are a few significant ones.
Comments on Media Accessibility Requirements
http://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements
Last modified 2010-06-16
(I've used asterisk to mark comments that seem wide in scope, rather
than being merely about one particular item.)
General
1. I noticed that in two places the document skips heading levels,
and that the navigation links at the bottom are headings, which
doesn't seem appropriate. (The Firefox extension HeadingsMap
highlights these discrepancies.)
2. Link to definitions for screen reader, AT, etc?
3. * Making things available to AT is explicitly required in a very
few instances (e.g. Transcripts), which doesn't seem an
intentional choice. The two sections devoted to AT compatibility
also call out a few requirements, making it all rather confusing.
I suggest making it a more general requirement applying to almost
everything (e.g. AT access to caption text and its formatting,
hyperlinks, etc.).
4. * In general it doesn't distinguish expected steps (such as
keyboard access and customizable color) from steps that would be
going above and beyond core expectations (such as most of the
steps listed for Autism). This could really mislead and turn off
readers who interpret these as unrealistic expectations for most
media. Are all listed requirements really deserving of core-level
status?
5. * In general it could benefit from forward references to related
sections further down in the document.
Accessible media requirements by type of disability
6. Re dyslexia mention synchronized highlighting of phrases in text
with audio.
7. Why capitalize "Communication, Social Interaction, and Repetitive
Behaviors"?
8. Incomplete sentence: "Since individuals on the autism spectrum can
be quite visual and learn effectively from social stories."
9. In "Dexterity / Mobility impairment" should note that many users
rely on AT such as on-screen keyboards or speech recognition.
10. In "Accessible media requirements by type of disability" I'd add a
section explaining that many people have multiple disabilities,
and that while deaf-blind is one category, there are others not
specifically called out here. Users for example may have low
vision and difficulty typing,
11. The term "Sensory disability users" isn't used very much, and
might be considered less politically correct than "users with
sensory disabilities".
Audio Description: Voiced, Texted, and Extended
12. * Consider grouping AD, TAD, and EAD in a single section on audio
descriptions, because they have a lot of overlap, are closely
related, and having three headings disproportionately emphasizes
them over other technologies that get a single section. Same with
captioning and extended captioning, and the two sections on AT
compatibility.
13. "They are written to convey objective information (e.g., a yellow
flower) rather than subjective judgments (e.g., a beautiful
flower)" may be correct but seems odd to me. I'm sure the script
called for a beautiful flower, rather than merely a yellow one,
and the beauty is what the writer and director were trying to
convey, so it seems strange to actively avoid conveying it.
14. These two bullet items seem redundant: "Closed descriptions can be
recorded as a separate track containing descriptions only, timed
to play at specific spots in the timeline and played in parallel
with the program-audio track."; "Some audio descriptions can be
given as a separate audio channel mixed in at the player." Are
"track" and "channel" used here as technical terms for different
things, or is it just a linguistic choice?
15. "Audio description is available...; however regulation in the U.S.
and Europe is increasingly focusing on description..." I think you
mean, "and" rather than "however".
16. The term "audio/video descriptions" seems misleading as (to me at
least) it sounds like it's discussing both audio descriptions of
visual content (e.g. a second audio track) and visual descriptions
of visual content (e.g. displayed text).
17. Re list introductions like "Systems supporting audio/video
descriptions that are not open must", do you nowhere say that
systems that provide audio are required to support audio/video
descriptions?
18. "(AD-1) Provide an indication that descriptions are available, and
are active/non-active." seems useful but not necessarily a core
requirement. I believe that most television viewers who try closed
captions are used to just turning them on and waiting to see
whether any captions are actually displayed, which is actually
more convenient than requesting a display telling them whether
there are captions and only then turning on the caption display.
19. "The degree and speed of volume change should be under provider
control" what is meant by provider in this case? The term hasn't
been used in the discussion thus far.
20. "(AD-8) Allow the author to provide fade and pan controls to be
accurately synchronized with the original soundtrack." is not
really enough information for novices like me. You might want to
elaborate on the goal. Is it to have the description sound like
the narrator is standing in the same location as the object being
described? Also, one difference between AD-8 and AD-13 is the
former is all about author control, whereas the latter gives
control to both, but still fails to specify that the user
preference should override author preference.
21. Is there supposed to be another document or section that would go
into more details on these requirements? Quite a few of them seem
too high-level to be useful; for example, "(TAD-1) Support
presentation of texted audio description through a screen-reader
or braille device with playback speed control and voice control
and synchronization points with the video."
22. "(AD-10) Allow the user to select from among different languages
of descriptions, if available, even if they are different from the
language of the main soundtrack." I'd add "or from the general
system language setting.", for example choosing audio descriptions
in your native Farzi even if you're using English for your
operating system's primary language and listening to a film with
Japanese audio.
23. I suggest adding something early on letting readers know that
additional, advanced features are discussed in separate sections
below. For example, when I first read this I noted that it lacked
allowing the audio description track (speech or text) to pause and
resume the media with which it's synchronized. (For example, for
video with audio being watched when all viewers want the
descriptions, the user might choose a descriptive track that
pauses the normal content in order to insert more detailed
descriptions than could fit in the main content's normal gaps.) I
wrote a comment about it, only later to find it was included in a
separate section.
24. Shouldn't the audio description requirements (or recommendations)
include the user ability to omit the video altogether, leaving
only the normal audio and descriptions?
25. "Texted audio descriptions are provided as text files with a start
time for a description cue." It would help to mention any
standardized formats used for this purpose.
26. Compare and contrast "(TAD-3) Where possible, support to present a
text or separate audio track privately to those that need it in a
mixed-viewing situation, e.g. through headphones." vs. "(AD-11)
Support the simultaneous playback of both the described and
non-described audio tracks so that one may be directed at separate
outputs (e.g., a speaker and headphones)." The key differences
aren't conveyed clearly.
27. "(TAD-4) Where possible, support for different options for authors
& users to deal with the overflow case: continue reading, stop
reading, and pause the video. Pause the primary audio and video.
The preferred solution from a user POV is to pause the video and
finish reading out the TAD." Consider rephrasing as "pause the
/primary audio and video/ until the TAD catches up."
28. In the discussion of texted audio description, might want to
clarify that every time you say "video" you of course mean both
the primary video and audio content.
29. Reading top to bottom, I kept thinking that the document
overlooked variations until I encountered them further down. I
would recommend that the introduction to audio descriptions
mention that subsequent sections will discuss basic audio
descriptions, texted audio descriptions, and extended audio
descriptions. Similarly, the discussions of AD and TAD might
allude to the fact that sometimes the descriptions are too long
for pauses, and refer the reader to the section on extended audio
descriptions below.
30. EAD-2 (automatically pausing) would be impractical without EAD-3
(automatically resuming), so you might just combine them.
31. TAD-4 and EAD-1 blur the boundary between TAD and EAD. If a system
supports TAD-4 it supports EAD, so might take out TAD-4 and refer
the reader to the EAD section.
32. EAD section might explicitly say it applies to both AD and TAD.
Clear Audio (CA)
33. "(CA-4) Potentially support pre-emphasis filers" I think you meant
"filters".
Content Navigation by Content Structure (CN)
34. "Short music selections tend to have versus and repeating
choruses" I think you meant "verses".
35. In the section on structured navigation, your discussion of h1
isn't what I would have expected. In HTML documents, h1 is
normally the title of the current document, regardless of the
scope of that document. For example, an online book would
typically be divided into multiple pages and the h1 for the main
page would be the title of the book, while the h1 for a chapter
would be the title of the chapter, and if you can delve more
deeply and reach a page for a section its h1 would be the title of
that section. Thus, where you say "In a news broadcast, the most
global level (analogous to <h1>) might be 'News, Weather, and
Sports.'" I would have expected the h1 equivalent would more like
"KIRO 7 Eyewitness News at 5PM".
36. "Audio productions of 'The Divine Comedy' may well include
reproductions of famous frescoes or paintings interspersed
throughout the text", did you mean video or multimedia
productions? I don't expect many audio productions to reproduce
the frescoes and paintings :-)
37. "Nowadays, these programs are based on the ANSI/NISO Z39.86
specifications." You might say "ANSI/NISO Z39.86 (DAISY)
specifications" in order to reference its commonly-used friendly name.
38. In the introduction to structured navigation, the final two
paragraphs (UAAG references) seem entirely out of place.
39. In some places the document interleaves requirements for authoring
tools (e.g. CN-1) with requirements for content players (e.g.
CN-2), which is a little confusing.
40. I think I could figure out what "transport bar" means, but then
two paragraphs later "navigation track" comes along and I'm not
sure what the difference would be.
41. Shouldn't structural navigation requirements include providing the
user with a navigable table of contents?
42. "(CN-1) Generally, provide accessible keyboard controls for
navigating a media resource in lieu of clicking on the transport
bar need to be available, e.g. 5sec forward/back, 30sec
forward/back, beginning, end" is in the h3 section titled "Content
Navigation by Content Structure" but isn't about navigating by
structure, nor does it fit in the larger h2 section "Alternative
Content Technologies".
43. If you were going to include CN-1 saying that content navigation
controls need to be keyboard accessible, that would imply that all
sections discussing user input needs to have a similar requirement
for keyboard access. Seems better just to refer readers to the
section on keyboard access which requires it for /everything/, and
perhaps provide a non-exhaustive list of instances you think they
might overlook.
44. * Seems odd that there are a lot of things here that I don't
believe are in UAAG. For example, CN-9 requires the user be able
to skip or filter out ancillary content such as sidebars, but I
don't believe UAAG20 requires that Web browsers allow the user to
exclude such things from the keyboard navigation or voicing order.
Captioning (CC)
45. "Captions are always written in the same language as the main
audio track." And yet, I've not seen DVD or set-top boxes
distinguish between same-language and different-language captions.
Also, you should discuss here the use of foreign language
captions, rather than only mentioning them in the lead-in sentence
for the requirements. Also, CC-26 explicitly acknowledges that
there can be be multiple tracks of captions in different languages.
46. "Closed captions are transmitted as data along with the video...",
wouldn't the category of closed captions also include captions
that are pulled down only on demand, possibly from another source
entirely, rather than transmitted with the video? Or is there
another term for that?
47. "...turn them on, usually by invoking an on-screen control or menu
selection" or a dedicated physical button such as on a remote
control.
48. "Open captions are always visible; they have been merged with the
video track and cannot be turned off." Except by selecting a
different video track.
49. Interesting to note that while users of closed captions may prefer
verbatim text, operas are usually supertitled using shortened
versions of the libretto, to make it easier for readers to follow
along without spending too much time reading each line. This is
true even of same-language supertitles.
50. As noted above it's confusing to first mention subtitles and
foreign language subtitles in the lead-in to the requirements,
without introducing the concepts or clarifying that they'd use the
same technologies as same language captions.
51. * "(CC-10) Render a background in a range of colors, supporting a
full range of opacities." With this and several similar
requirements, do you want to clarify that the caption author
should be able to specify a background color, or do you feel it
would be acceptable for the player to choose what it considers a
background appropriate for the text color and video background?
Should the user be able to override caption attributes such as these?
52. There are several requirements for horizontal languages without
corresponding requirements for vertical languages. For example,
should CC-15 or a parallel equivalent require that captions can be
positioned at least a minimum distance from the side of the screen?
53. "(CC-21) Permit the distinction between different speakers." An
example of one that requires more detail. For example, any system
would allow one to prefix strings with the name of the speaker,
and you already require the author to be able to put strings in
different locations. Do you mean markup so the rendering agent can
apply automatic, distinct formatting styles, or so that assistive
technology examining the captions can convey the distinctions to
users through other means?
54. The lists titled "Formats for captions, subtitles or
foreign-language subtitles must" and "Further, systems that
support captions must" should probably use parallel construction,
as I assume they both relate to all types of captions, including
same language and foreign language, and regardless of whether
they're formatted as subtitles or otherwise.
55. A number of items in the list "Formats for captions, subtitles or
foreign-language subtitles must" seem to be discussing the systems
that display the captions rather than the formats for specifying
them. It may just be a matter of rewording a number of the items,
such as changing "(CC-1) Render text in a time-synchronized
manner, using the audio track as the timebase master." To "(CC-1)
Allow the author to specify the time and duration at which text is
displayed, using the audio track as the timebase master." and
"(CC-11) Render text in a range of colors." to "(CC-11) Allow the
author to specify colors for ranges of text."
56. The list titled "Further, systems that support captions must"
should probably include one or more requirements to support the
wide range of author-specified markup that caption formats are
required to support. For example, having a caption format that
allows the author to specify text color is wasted when a player
ignores those settings.
57. Why is captioning the only section to distinguish requirements for
data formats from requirements for rendering systems? Wouldn't
that distinction apply just as much (or little) for audio
description, sign language, etc.?
Extended Captioning
58. It might be helpful to give an example of how this could be used.
For example, an ancillary window could display a scrolling list of
the most recent hyperlinks to be provided in captions, so that the
link doesn't disappear after just a few seconds when the next set
of captions is displayed.
Sign Translation
59. "mixed with the video and offered as an entirely alternate stream"
should be in parentheses instead of commas.
60. "(SL-3) Support the display of sign language video either as
picture-in-picture or alpha-blended overlay..." in these clauses
the use of "or" leaves it ambiguous whether the system needs to
support both methods and allow the author to choose, or whether
the system is allowed to support only one of the options.
Transcripts
61. I believe "Providing a full transcript is a good option in
addition to, but not as a replacement for, timed captioning"
conflicts with UAAG20 where we acknowledge situations where
transcripts are more appropriate than synchronized captions. For
example, transcripts are usually sufficient for pre-recorded
audio-only media.
62. "A transcript can either be presented simultaneously with the
media material, which can assist slower readers or those who need
more time to reference context, but it should also be made
available independently of the media." has inconsistent grammar:
probably want to delete "either".
63. I would suggest avoiding the word "provisioning" because it's
jargon and there are other terms that are more widely understood.
Also, it's not used elsewhere in the document.
System Requirements
64. This section could use an intro paragraph. I assume it's a
catch-all for requirements that don't fit into a single
alternative content technology, all of which were in the previous
section. However, the term "system requirements" parallels that
used under "Captioning" where it meant requirements for players as
distinct from data formats, and that's confusing, especially since
other sections such as that on assistive technology are certainly
system requirements. Any catch-all section should probably be at
the end rather than in the middle.
Keyboard Access to interactive controls / menus
65. As noted above, it should be made clear that access through
keyboards and keyboard emulators is not optional, despite the
phrase "Systems supporting keyboard accessibility must..."
66. The phrase "interactive controls / menus" in the title is
misleading, since it is not limited to things that are
"interactive" as in having input and output, and "controls/menus"
implies things with visual representation on the screen. For
example, if a player supports navigation using mouse gestures,
those should also all have keyboard equivalents.
Granularity Level Control for Structural Navigation
67. "(CNS-3) This control must be input device agnostic." We don't
talk about agnostic elsewhere, so might rephrase it. Since
functionality needs to be available through the keyboard (already
required by KA-1) this essentially says that all keyboard
navigation commands need to also have equivalents for every other
input device (e.g. pointing devices, and on some systems speech or
gestures). Is that what you intended to require?
68. Isn't this entire section redundant to the content navigation section?
Time Scale Modification
69. This is the first list of requirements that isn't scoped with
"Systems supporting such and so must". Does it really rate being
the only universal requirement?
Production practice and resulting requirements
No comments.
Discovery and activation/deactivation of available alternative
content by the user
70. Re "The user agent /can/ facilitate the discovery of alternative
content by following the criteria", this is the first list of
requirements to be described as optional, with the word "can"
instead of "must".
71. Most of these requirements are already covered in their
appropriate sections of the document.
72. "(DAC-3) The user can browse the alternatives, switch between
them." Should be replaced by the newer UAAG wording.
Requirements on making properties available to the accessibility
interface
73. Should refer the reader to the section on assistive technology API
further down in the document, or better yet, come after it.
74. "any media controls need to be connected to that API" should be
"any media controls and text content need to be..."
75. "On self-contained products that do not support assistive
technology, any menus in the content need to provide information
in alternative formats" I'm skeptical of seeming to limit this to
menus when it really means menus and other controls.
76. "make accessibility controls, such as the closed-caption toggle,
as prominent as the volume or channel controls" As I commented on
the 508 Refresh, while this is well intentioned, a quick review of
remote controls for televisions and set-top boxes indicated that
most if not all give special prominent for volume and channel
controls, because they're probably the most commonly used
controls. I don't think that it is necessary for dedicated caption
and video description controls to be equal in prominence, and thus
tied for the most prominent controls on the device. This is
especially true in because many people who use captions will turn
them on and leave them on, rather than toggling them frequently.
77. The sentences on remote controls and physical button layout don't
really fit the section title ("making properties available to the
accessibility interface").
78. Technically, the closed-system requirements don't fit the title
either, but at least they thematically go with AT compatibility so
the title could be changed to better incorporate both.
79. I really can't understand any of the requirements in this section
as they're currently written. For example, while "(API-1) Support
to expose the alternative content tracks for a media resource to
the user, i.e. to the browser" is clear with regard to alt text
/when displayed or hidden/, and captions /as they're displayed/,
what does it mean with regard to secondary audio tracks?
Requirements on the use of the viewport
80. Re "(VP-1) If alternative content has a different height or width
to the media content, then the user agent will reflow the
viewport." This seems more relevant to the container than to media
per se. Is it talking about when video is replaced by description
or when a caption field is added below the video field?
81. If we're talking about containers hosting media, then it brings in
a few additional requirements not yet listed here, such as the
ability to move the keyboard focus into and out of media objects.
82. "(VP-5) Captions occupy traditionally the lower-third of the video
- the use of this area for other controls or content needs to be
avoided." This is one of the few "requirements" that is phrased as
a recommendation.
Requirements on the parallel use of alternate content on potentially
multiple devices in parallel
83. The requirements in this section are all about supporting
assistive technology, which doesn't fit with the title or
introduction to this section ("Requirements on the parallel use of
alternate content on potentially multiple devices in parallel").
The title and intro should be changed.