I'm going to try to clear up some of my own confusion here.
I think we might need three pieces of information linked to a media (video or audio) element:
* a short text (kinda like alt)
* a long description
* a transcript
in all cases, they should provide equivalents for what someone who can consume the media 'normally' would pick up. (I think this is as true of audio as of video, by the way).
So, I was sort of right and sort-of wrong when I said that the short-text should not describe the poster, but the media. I'm right, the element is more than the first frame or poster frame. I'm wrong, in that the (jn this case sighted) normal user would have gathered something from that initial frame.
so, not good:
<video poster="TheLeopard.jpg" short-text="A movie poster for The Leopard" src="..." />
because the sighted user will know it's a video element and that it's offering them the trailer.
Way better is to relay some of the information from the poster:
<video poster="TheLeopard.jpg" short-text="Trailer for The Leopard, starring Burt Lancaster" src="..." />
the long description can provide a more narrative version of the trailer, and the transcript a full transcript. This way the short text is enabling the non-sighted user just like the sighted one:
sighted: see poster, decide it's interesting, watch trailer
non-sighted: get the short-text, decide it's interesting, read the long description and/or transcript
(I'm using non-sighted as a shorthand for someone who, for whatever reason, can't see the video - their eyes are busy elsewhere, their UA is unable to play it, and so on. Hope that's OK).
(I changed from Clockwork Orange because I didn't want to write anything about that great but disturbing movie).
David Singer
Multimedia and Software Standards, Apple Inc.