Thanks to Kelly Ford, Jim Allan, Jeanne Spellman, Leonie Watson and Rich
Schwerdtfeger for joining the media sub-team call this week.
I'd like to try and lay this out from a user-requirements perspective
first, to be sure of what I think we need from a user-perspective is
clearly understood and agreed to (and I am prepared to hear that I am
wrong). Then an examination of the history up to where we are today (I
think) - all of course from my perspective.
The scenario: we have a web-page that contains, visually, a bounding box
that represents where the video will play.
Above that box we have a title - Gone With The Wind - and below the box we
have a paragraph of text:
"American classic in which a manipulative woman and a roguish man carry on
a turbulent love affair in the American south during the Civil War and
Reconstruction."
In code, we have the following:
<h1>Gone With The Wind</h1>
<video src="movie.mp4"></video>
<p> American classic in which a manipulative woman and a roguish man carry
on a turbulent love affair in the American south during the Civil War and
Reconstruction.</p>
(yes we should also have <track src="caption file"> but assume it's there)
Semantically speaking, we have a short name and a longer description,
which is addressing the movie itself. The implied semantics exist, but for
clarity the specific semantics are further defined (perhaps because there
is more than one movie on the page):
<h1 id="movieTitle">Gone With The Wind</h1>
<video src="movie.mp4" aria-labeledby="movieTitle"
aria-describedby="description"></video>
<p id="description"> American classic in which a manipulative woman and a
roguish man carry on a turbulent love affair in the American south during
the Civil War and Reconstruction.</p>
So far, so good.
However, inside of the bounding box there is a static image. Let's not get
bogged down on the source of that imagery (either via @poster, or the
first frame of the video, or whatever), but let's agree that the image is
the original movie poster from Gone With The Wind, as seen here:
http://ia.media-imdb.com/images/M/MV5BMjE1MTk0MTE5NF5BMl5BanBnXkFtZTYwMTUx
Nzg4._V1._SY317_CR2,0,214,317_.jpg
(Note: the graphic image could just as easily be an image of the MGM Lion,
or a Green Screen Parental rating guide, or an advert for tooth-paste. The
choice of the word "poster" has introduced some misunderstandings that
need to be acknowledge as well)
For the non-sighted users reading this, a longer textual description of
the imagery would be:
"Clark Gable embraces Viven Leigh, staring into her eyes romantically. In
the background is an ominous fire-red sunset and the silhouette of trees
and a couple arm-in-arm in the distance. The poster reads David O.
Selznick's adaptation of Margaret Mitchell's Gone with the Wind. Winner of
10 Academy Awards."
The semantic question becomes, is this a description of the movie, or of
the movie poster?
If we can agree that it is the movie poster, and that it is important that
a means of linking that descriptive text to the multi-media asset is
important, then *HOW* do we do it? And importantly (as in the case of the
whole @longdesc debate), how do we do it when we know that from a
visual/design perspective most designers will likely not want that rich
textual description visible on screen.
I believe that here we have introduced some new yet different semantic
information. It is clearly related to the multi-media experience, yet it's
not *really* the movie, it's the precursor to the movie. But it is also a
rich visual experience, further complicated by the fact that there is text
embedded into that image.
*********
Originally, I had proposed we should deal with the uniqueness of this
not-movie visual expression - the "poster" - by introducing a child
element of video, like this:
<h1 id="movieTitle">Gone With The Wind</h1>
<video src="movie.mp4" aria-labeledby="movieTitle"
aria-describedby="description">
<poster
alt="David O. Selznick's adaptation of Margaret
Mitchell's Gone with the Wind. Winner of 10 Academy Awards."
longdesc="file-with-the-rest-of-the-description.html">
</video>
<p id="description"> American classic in which a manipulative woman and a
roguish man carry on a turbulent love affair in the American south during
the Civil War and Reconstruction.</p>
Note that in the example above, the alt text is *more* than the Title in
the <h1>, and there is no SRC attribute, because the imagery would be
derived from the first frame of "movie.mp4"; should the imagery be an
actual discrete JPG, it could then be referenced by SRC, like this:
<poster
alt="David O. Selznick's adaptation of Margaret Mitchell's
Gone with the Wind. Winner of 10 Academy Awards"
longdesc="file-with-the-rest-of-the-description.html"
src="poster.jpg">
(note, this proposal would have made @poster as an attribute of <video>
obsolete, as the specifying of the JPG file would move from being an
attribute of the <video> element to becoming an attribute of the child
element of <poster> - or in my proposal <firstframe>)
Here, the not-movie visual imagery has a short name (provided by @alt) and
a means for associating a longer description (using @longdesc).
This proposal was rejected by the Working Group chairs (Issue 142) as they
claimed that... well, I'm not really sure what their claim was, but it
suggested that I was proposing a broken element (because presumably
sometimes @src could be omitted which is perfectly valid - at least that
was my reading of the decision -
http://lists.w3.org/Archives/Public/public-html/2011Mar/0690.html) - Oh,
that and their failure to read that I was not proposing actual spec text
per-se (I even specifically asked for assistance), which is the grounds
for my current, active Formal Objection on Issue 142. (I have indicated
that should we solve the *problem* however that I would remove the FO, as
results are more important to me than religion.)
*********
As we returned to the this issue, Silvia (and I) re-examined the
requirements and cooked-up a different approach. It leverages ARIA a
little more than the initial suggestion I had, but on paper it looked like
it could still solve the larger requirement set. Using the same example,
but re-written in this new approach, we would have the following:
<h1 id="movieTitle">Gone With The Wind</h1>
<video src="movie.mp4" aria-labeledby="movieTitle"
aria-describedby="description poster">
<p id="poster">David O. Selznick's adaptation of Margaret
Mitchell's Gone with the Wind. Winner of 10 Academy Awards. A full
description of the poster is <a
href="file-with-the-rest-of-the-description.html">also available</a>.</p>
</video>
<p id="description">American classic in which a manipulative woman and a
roguish man carry on a turbulent love affair in the American south during
the Civil War and Reconstruction.</p>
With this, we have again captured what I believe to be all of the discrete
semantics, and while I have some questions about user-experience, I was
generally satisfied that for a 'professional' authoring of this by a
developer, all of the tools the author needed where there.
*********
The questions/concerns I had focus on a few specific behaviors - and what,
if anything we can do, should we do, and *who* should be doing what? They
are:
1) My concern about the concatenation of the two descriptions into a flat
reading. Is this a problem? When a screen reader focuses on the <video>
element, my understanding today is that what would be read aloud would be:
"David O. Selznick's adaptation of Margaret Mitchell's Gone with
the Wind. Winner of 10 Academy Awards. A full description of the poster is
also available. American classic in which a manipulative woman and a
roguish man carry on a turbulent love affair in the American south during
the Civil War and Reconstruction."
(For example, is the 'pausing' caused by the period after the words "Wind"
and "available" preserved, or will the speech synthesizer just plow on
through as one run-on sentence? Do we need a 'longer' pause between the
description of the movie and the description of the poster? If yes, how do
we do this?)
2) I have a concern that apparently HTML-rich text being passed to the
Accessibility API is being "flattened" - i.e. none of the HTML-richness is
preserved. This would thus kill off the link being provided by: "A full
description of the poster is also available." (This has surfaced in the
@longdesc discussion as well: apparently Firefox is preserving the
richness - needs to be tested/confirmed - but the other browsers are not.
This might be a deal-breaker here.)
3) Order of reading: I presume that the aria-describedby texts are
read/rendered in the order they are authored. In other words, if I
reversed the order of the attribute values (...aria-describedby="poster
description">...) then what is passed forward would be:
"American classic in which a manipulative woman and a roguish man
carry on a turbulent love affair in the American south during the Civil
War and Reconstruction. David O. Selznick's adaptation of Margaret
Mitchell's Gone with the Wind. Winner of 10 Academy Awards. A full
description of the poster is also available."
Is this a problem (I can see where it might be sometimes)? Is this
addressed exclusively as authoring guidance, or is there a way we can
specify rendering order regardless of authoring order? Is this worth
worrying about?
4) We have 2 paragraphs of textual description, describing 2 discrete
things. Yet which paragraph is describing which thing? In the example I
have used IDs of "description" and "poster" for clarity or examples, but
we already know that IDs are machine readable but carry no semantics - I
could have just as easily used the IDs of "this" and "that" - they would
have worked as association "hooks", but no semantics are being passed
along. My thoughts are that we could either investigate introducing new
aria roles (but hear concerns of feature creep), or should we also look to
use aria-label, like this:
<p id="that" aria-label="poster description">David O. Selznick's
adaptation of Margaret Mitchell's Gone with the Wind. Winner of 10 Academy
Awards. A full description of the poster is <a
href="file-with-the-rest-of-the-description.html">also available</a>.</p>
Again, is this richness preserved or flattened? As an author, would
writing this have any difference:
<div id="that"><p aria-label="poster description">David O.
Selznick's adaptation of Margaret Mitchell's Gone with the Wind. Winner of
10 Academy Awards. A full description of the poster is <a
href="file-with-the-rest-of-the-description.html">also
available</a>.</p></div>
...where the <div>'s ID provides the association, but the <p> and it's
aria-label is semantically preserved? Do we need this? Do we have this?
It has been discussed that some of these issues are browser-implementation
issues, and that bugs need to be filed at that level (Eric reconfirmed
this point on last week's call) - however, to do that, it seems that the
ARIA CR is not specific enough (sorry Rich/PF), so is there something we
can do to address this problem? Does either of these proposals appear to
be superior to the other, or is it ToMayto versus ToMato? Is there another
way forward?
Friends, I truly am agnostic on *how* we solve this problem. While I
continue to think that my initial proposal of introducing a new child of
<video> could work, I am also convinced that if we can work out the
wrinkles of this second proposal that it too would address the needs
requirements.
And so, thoughts?
JF