Description

[[File:Translating the wiki way.webm|thumb|400px|thumbtime=16:00|Niklas Laxström explains the features that allowed [[translatewiki.net]] to provide MediaWiki with more than 300 locales.|alt=Niklas Laxström, ''Translating the wiki way: Simple, fast, fun'', Wikimania 2012]]

Renders visually as:

That is to say, the alt text is not the caption. Rather, the bit that doesn't start with any with equals sign, is the caption. When editing the above in VisualEditor, it looks fine at first:

But, when without changing anything, or when changing something unrelated elsewhere on the page, it will be serialised incorrectly. This caused an unsuspecting user to save a dirty diff (revision 851249986) that also unexpectedly inserted use of <nowiki>.

After

[[File:Translating the wiki way.webm|thumb|400px|thumbtime=16:00|Niklas Laxström explains the features that allowed [[translatewiki.net]] to provide MediaWiki with more than 300 locales.|alt=Niklas Laxström, <nowiki>''</nowiki>Translating the wiki way: Simple, fast, fun<nowiki>''</nowiki>, Wikimania 2012]]

In addition to the dirty diff, it also does not roundtrip. After the above edit, re-opening it in VisualEditor now mistakes the alt text (including the syntax alt= itself) as the caption:

After this second save, the page remains consistently damaged with the caption removed.

There's a misfeature in the PHP parser where "anything which doesn't properly parse as an option" is assumed to be a caption. Parsoid mimics this behavior, for better or for worse. (Predictably, see a new syntax proposal here.)

Anyway, that's just to say that the problem is most likely "alt option fails to parse"; the "interpreted as a caption" part is just the "expected" side-effect of any option failing to parse.

Well, a combination of things. It's triggered by the quote marks, but then we're not handling the <nowiki> in the serialized version either. [[File:Foo.jpg|alt=<nowiki>''alt''</nowiki>]] ought to be the "correct" way to get embedded single quotes into the alt value. It looks like we might have a similar issue with the link option as well. I've got a working patch for alt, working on understanding the link issue.

Turns out there's a bug in how core PHP parses [[File:Foo.jpg|link=Foo''s bar''s]] (which is a valid title) or [[File:Foo.jpg|link=''Main Page'']] (where the italics apparently should be stripped). So now I've got a patch for core as well as one for Parsoid...

This turned into a little rathole, but I've come out the other end fixing (a) how Parsoid parses alt/link options (wikitext markup including <nowiki> is allowed), (b) how Parsoid renders alt/link options (consistent stripping), (c) how core renders link options (<nowiki> expansion and stripping consistent with alt) , (d) how core handles ampersands in alt/link options (bug in remex), and (e) how Parsoid handles ampersands in alt/link options. Now we've just got to get those three patches merged, starting with the remex bug (T207088: Remex double-decodes HTML entities on PHP (not HHVM)) because the newly-added test cases won't pass on jenkins until remex is fixed and the fix is packaged and released so composer can get to it.

Did you check the other issues from https://phabricator.wikimedia.org/T206940#4670526 ? Sounds like you're saying that wt2html is fixed (both PHP and Parsoid agree and do something sensible) but that html2wt for video specifically is still broken since it treats the (invisible) embedded alt attribute as plaintext rather than HTML. Or maybe html2wt is alright but we shouldn't be embedding the invisible alt as HTML but should be doing the same tag-stripping that we would do for a "real" alt attribute. (I think I'm a little partial to this latter.)

Tangentially, I was mainly verifying that we fixed the original issue that was filed.

Sounds like you're saying that wt2html is fixed (both PHP and Parsoid agree and do something sensible) but that html2wt for video specifically is still broken since it treats the (invisible) embedded alt attribute as plaintext rather than HTML.

The attribute that we're stuffing in data-mw claims to be html but it looks to be unparsed (at least the nowikis aren't rendered as spans),

Or maybe html2wt is alright but we shouldn't be embedding the invisible alt as HTML but should be doing the same tag-stripping that we would do for a "real" alt attribute. (I think I'm a little partial to this latter.)