N.B.: Despite what you might infer from the MathML Spec, I'm not suggesting Content-MML or any of the other suggested uses of the semantics element; just the ability to insert an SVG or (X)HTML subtree in a MathML expression.

Looking at Jacques' first link, <semantics> seems for current implementation purposes to be sort of like an <object> tag, but instead of nested representations, there's a list of <annotation-xml> ones. Is that correct? Are we supposed to render the first one that we can handle? Maybe in principle the browser should be smart enough to pick a "best" child to render, but that won't happen soon (maybe ever?).
Sorry for the dumb questions, I haven't looked at Content MathML much yet. Expert guidance is appreciated!
P.S. Speaking of Firefox MathML, Neil Soiffer once mentioned in an e-mail "... support <semantics> (generated by MathType and other authoring tools) to the point of it not causing an error ... probably only a few hours work (just render the first child)." Perhaps he misspoke, but did Firefox do this, or maybe just render the last child? Should we match whatever behavior Firefox (or others) implemented? (MathJax? MathPlayer?)

Thanks Fred! I thought about asking about <mtext>, but don't you also want <semantics> for listing multiple representations, like <object>? And I don't get the bit about breaking Jacques' testcase - you mention defaulting to <semantics>'s first child if necessary already.
Also, I'm now finding more info in Jacques' second link:
5.1.3: A particularly important case is the use of a presentation MathML expression to indicate a preferred rendering for a content MathML expression. This case may be represented by labeling the annotation with the application/mathml-presentation+xml value for the encoding attribute. For backward compatibility with MathML 2.0, this case can also be represented with the equivalent MathML-Presentation value for the encoding attribute. Note that when a presentation MathML annotation is present in a semantics element, it may be used as the default rendering of the semantics element, instead of the default rendering of the first child.
5.2.1.1: The default rendering of a semantics element is the default rendering of its first child. A renderer may use the information contained in the annotations to customize its rendering of the annotated element.
5.3.1: In particular, any valid presentation expression can be embedded in a content expression by placing it as the first child of a semantics element.
I'd still like to hear more from other experts (David C.?). Do we have to worry about 5.1.3 above, or just render the first child, at least for now? Would this be enough for existing sites like Jacques', and match other implementations well enough?

There's a clear advantage to using <semantics> with <annotation-xml> child(ren) to embed foreign content: namely, as Dave B notes in his comments, you can provide for fallback content (a UA which doesn't support SVG could render a GIF image, instead).
You can't do that with <mtext>.
Note: I got my original implementation from attempting to follow
http://www.w3.org/Math/Documents/Notes/graphics.xml#svg-in-mathml-guidelines
which is, admittedly, kinda ancient. But it was the best guidance available at the time.

I think the correct implementation (as far as I can glean from the Spec) is
1) If the first child is a PMML element, render that.
2) If not, go through the <annotation-xml> and <annotation> children and render the first one that you support (an <annotation-xml> containing SVG or (X)HTML, or an <annotation> containing plain text).
That would work fine with my content (where the first child is <annotation-xml> containing SVG).

(In reply to comment #6)
> Note: I got my original implementation from attempting to follow
>
> http://www.w3.org/Math/Documents/Notes/graphics.xml#svg-in-mathml-guidelines
>
> which is, admittedly, kinda ancient. But it was the best guidance available at the time.
Ah yes, I remember this W3C note now. I understand the importance of <semantics> but it seems to be that the correct structure is
- first child: default rendering.
- other children: annotation/annotation-xml elements providing alternate rendering.
Here you are directly using annotation-xml as the first child and I'm not sure it is what is suggested in the spec. Perhaps David can give better explanation on this?
BTW, I'm wondering what are the "official" encodings. I guess "Presentation-MathML" and "Content-MathML" as defined in the MathML rec. Maybe this SVG1.1 value mentioned in the note. But in general, I guess we can use things like "image/svg+xml".

Not quite.
If the first child is a Content-MML element, it DOESN'T HAVE a default rendering. Instead, the UA is supposed come up with a rendering, based on the subsequent <annotation-xml> or <annotation> child(ren).

(In reply to comment #7)
> I think the correct implementation (as far as I can glean from the Spec) is
>
> 1) If the first child is a PMML element, render that.
> 2) If not, go through the <annotation-xml> and <annotation> children and render the first one that you support (an <annotation-xml> containing SVG or (X)HTML, or an <annotation> containing plain text).
>
> That would work fine with my content (where the first child is <annotation-xml> containing SVG).
Yes, that's the way I was thinking about it. For Gecko implementation, I'm wondering how to know whether the first element is a presentation MathML element but I guess the child frame can just pass the information "I'm a presentation MathML element" to its semantics parent.
It's still not clear to me that the first child can actually be an annotation itself. From the W3C note, it seems that using a dummy <csymbol> would work, but is it allowed to just not use this "first child"?

(In reply to comment #9)
> Not quite.
>
> If the first child is a Content-MML element, it DOESN'T HAVE a default rendering. Instead, the UA is supposed come up with a rendering, based on the subsequent <annotation-xml> or <annotation> child(ren).
It seems to me that if a browser is able to render content MathML, then it is allowed to display this first child or choose among the annotation.

(In reply to comment #11)
> (In reply to comment #9)
> > Not quite.
> >
> > If the first child is a Content-MML element, it DOESN'T HAVE a default rendering. Instead, the UA is supposed come up with a rendering, based on the subsequent <annotation-xml> or <annotation> child(ren).
>
> It seems to me that if a browser is able to render content MathML, then it is allowed to display this first child or choose among the annotation.
Yes certainly.
The usage with no base expression and just annotations is really an abuse of notation that was valid MathML2 because of loose schema enforcement and works in gecko because the spec, when intending to say the default behaviour is to render the "first child" wasn't really expecting a case where there is no base expression and only annotations. Since Gecko simply renders the first child it of course picks up the first annotation in this case.
The MathML3 schema (which is generally rather tighter than the mathml2 schema which was very loose, allowing many things such as a fraction with more than 2 children) makes this invalid.
The MathML3 Schema is:
semantics = element semantics {semantics.attributes,
MathExpression,
(annotation|annotation-xml)*}
I became aware of this usage with only annotations somewhat later when I think some of Jacques' examples were given as examples of integration for html5.
I suppose if there is really a lot of usage of this construct a case could be made for a bug report changing the schema to MathExpression? so making it optional but really I'm not sure it is good usage to be annotating nothing. It has the benefit of working in firefox but structurally, annotating nothing isn't really what is wanted here.
As far as I can see, none of the semantics elements in
http://www.w3.org/Math/Documents/Notes/graphics.xml#svg-in-mathml-guidelines
use a semantics element with only annotations, in fact it explitly says the forst (non annotation) child is mandatatory:
The first child of the semantics element is the MathML
object whith[sic] which the additional information is associated, it can be
a presentation- or content MathML representation. This child is
mandatory in a semantics element, it can be seen as a fallback
representation in MathML, if the MathML application processing the
semantics element cannot process any of the annotation and
annotation-xml children. In our case, we have chosen to represent the
dice as a csymbol element, another choice would have been to give
an alternate text represented in a mtext element.

(In reply to comment #12)
> The usage with no base expression and just annotations is really an abuse of notation that was valid MathML2 because of loose schema enforcement and works in gecko
Indeed, I seem to recall that the more "correct" usage, with a bogus <csymbol> element as the first-child (and the <annotation-xml> element as the second child) DIDN'T work. Omitting the bogus <csymbol> worked fine in Gecko, so that's what I implemented.
> I became aware of this usage with only annotations somewhat later when I think some of Jacques' examples were given as examples of integration for html5.
>
> I suppose if there is really a lot of usage of this construct a case could be made for a bug report changing the schema to MathExpression? so making it optional but really I'm not sure it is good usage to be annotating nothing.
On the other hand, not having a mechanism for fallback content, when constructing such a compound document, is not a good idea either.
(Full disclosure; itex2MML's \begin{svg}...\end{svg} doesn't really provide a convenient mechanism for including fallback content. But more competently-designed authoring software certainly could.)
I could easily modify itex2MML to emit a bogus <csymbol> as the first-child of <semantics>, with the actual SVG payload in the second-child <annotation-xml>. ALL browsers would ignore the <csymbol> element, and the SVG-capable ones would render the SVG.
(Well, OK, Firefox would have to be modified, but it seems that's in the works, anyway.)
>It has the benefit of working in firefox but structurally, annotating nothing isn't really what is wanted here.
Well, I'll note that the Spec
http://www.w3.org/TR/MathML3/chapter3.html#presm.mtext
doesn't exactly go out of its way to indicate that <mtext> is the preferred mechanism for building compound documents. In fact, unless I misunderstand the schema
http://www.w3.org/TR/MathML3/appendixa.html#parsing_mtext
that usage is forbidden.

> --- Comment #13 from Jacques Distler <distler@golem.ph.utexas.edu>
> 2012-10-29 18:31:49 PST --- (In reply to comment #12)
>
>> The usage with no base expression and just annotations is really an
>> abuse of notation that was valid MathML2 because of loose schema
>> enforcement and works in gecko
>
> Indeed, I seem to recall that the more "correct" usage, with a bogus
> <csymbol> element as the first-child (and the <annotation-xml>
> element as the second child) DIDN'T work. Omitting the bogus
> <csymbol> worked fine in Gecko, so that's what I implemented.
Yes Gecko's implementation of just a simple css switch that
unconditionally renders the first child. I thought after posting that I
may have sounded a bit harsh on you which wasn't really fair (or
intended). I have the same problem with my own documents of juggling
between what is "standard" and what works. Choosing what works (and if
necessary getting the standard to change is not an unreasonable strategy.
>
> On the other hand, not having a mechanism for fallback content, when
> constructing such a compound document, is not a good idea either.
well you can still use semantics/annotation-xml to offer alternative
renderings to the mtext/svg so it's not that there is no mechanism it's
just that the markup cycles round a bit.
>
> (Full disclosure; itex2MML's \begin{svg}...\end{svg} doesn't really
> provide a convenient mechanism for including fallback content. But
> more competently-designed authoring software certainly could.)
>
> I could easily modify itex2MML to emit a bogus <csymbol> as the
> first-child of <semantics>, with the actual SVG payload in the
> second-child <annotation-xml>. ALL browsers would ignore the
> <csymbol> element, and the SVG-capable ones would render the SVG.
I'd rather view html+svg+mathml as a combined coherent document format
for scientific documents and see svg directly included in mathml via mtext
rather than as a foreign annotation in annotation-xml. If you do want to
use a semantics/annotation encoding to also work in pure mathml systems
that don't do svg I think the most natural first child now to use would
be an mglyph referencing an image fallback for the svg.
>
> (Well, OK, Firefox would have to be modified, but it seems that's in
> the works, anyway.)
>
>> It has the benefit of working in firefox but structurally,
>> annotating nothing isn't really what is wanted here.
>
> Well, I'll note that the Spec
>
> http://www.w3.org/TR/MathML3/chapter3.html#presm.mtext
>
> doesn't exactly go out of its way to indicate that <mtext> is the
> preferred mechanism for building compound documents. In fact, unless
> I misunderstand the schema
>
> http://www.w3.org/TR/MathML3/appendixa.html#parsing_mtext
>
> that usage is forbidden.
>
The descriptin in chapter 3 and the schema in apendixa is for pure mathml
so doesn't allow element content in the token elements.
The use in a compound (x)html+mathml document is governed by
section 6.4 which says
> When designing a compound document format in which MathML is included
> in a larger document type, the designer may extend the content model
> of MathML to allow additional elements. For example, a common
> extension is to extend the MathML schema such that elements from
> non-MathML namespaces are allowed in token elements, but not in other
> elements. MathML processors that encounter unknown markup should
> behave as follows:> ...> Extending the schema in this way is easily achieved using the Relax
> NG schema described in Appendix A Parsing MathML, it may be as
> simple as including the MathML schema whilst overriding the content
> model of mtext:
>
>
> default namespace m = "http://www.w3.org/1998/Math/MathML"
>
> include "mathml3.rnc" { mtext = element mtext {mtext.attributes,
> (token.content|anyElement)*} }
This is what the xhtml+mathml schema and (perhaps more importantly) the
hardwired rules in the HTML5 parser allow.
Currently the documentation of mathml+html(+svg) is regrettably rather
sparse. The MathML spec takes the view noted above that pure mathml is
just mathml and it is up to the host document type to define compound
formats. In this case the host document type is html5 but (apart from
the parser) the html(5) spec takes the view that everything that occurs
within <math> is deferred to the mathml spec. I've raised this as an
issue on html5 before without any great effect. I suspect we should
first get a good solution that works in webkit and gecko and then write
it up as a note or something so people have a better chance of finding it.
(I suspect we are wandering way off topic for a webkit bug: if the webkit folks rather we move this to www-math@w3.org, or anywhere else, please say)

So it seems that Gecko's implementation and its difference with the various specs have been a confusion for users. I hope we can work with the Webkit folks to implement a consistent behavior. Neil seems to disagree about the fact that the omission of the first child is now forbidden.
> an mglyph referencing an image fallback for the svg.
I guess there are many methods to provide an image fallback: mtext+img, annotation-xml+svg, mglyph, annotation@src etc but again Gecko does not implement all the methods and other MathML systems may use other techniques. For the mglyph (and perhaps related annotation) implementation, see bug 297465 (7 years old!).

Created attachment 209172[details]
Patch V1
The testcase should be fixed by the patch of bug 120058. I attach another experimental patch to attempt to implement the more advanced annotation selection that follows Gecko's behavior. However, it does not work yet.

Created attachment 219782[details]
Archive of layout-test-results from webkit-ews-12 for mac-mountainlion-wk2
The attached test failures were seen while running run-webkit-tests on the mac-wk2-ews.
Bot: webkit-ews-12 Port: mac-mountainlion-wk2 Platform: Mac OS X 10.8.5

Comment on attachment 219758[details]
Patch V3
View in context: https://bugs.webkit.org/attachment.cgi?id=219758&action=review> LayoutTests/mathml/presentation/semantics-3.html:24
> + <p>annotation-xml 7: <math><semantics><csymbol>Content MathML</csymbol><annotation-xml encoding="unknown"><mtext>error</mtext></annotation-xml><annotation-xml encoding="application/mathml-presentation+xml"><mtext>annotation-xml</mtext></annotation-xml><annotation>error</annotation></semantics></math></p>
can you add a case for "application/mathml+xml" since you explicitly called that out as something we should not support
> Source/WebCore/mathml/MathMLElement.h:57
> + return hasTagName(MathMLNames::mtrTag) || hasTagName(MathMLNames::mtdTag) || hasTagName(MathMLNames::maligngroupTag) || hasTagName(MathMLNames::malignmarkTag) || hasTagName(MathMLNames::mencloseTag) || hasTagName(MathMLNames::mglyphTag) || hasTagName(MathMLNames::mlabeledtrTag) || hasTagName(MathMLNames::mlongdivTag) || hasTagName(MathMLNames::mpaddedTag) || hasTagName(MathMLNames::msTag) || hasTagName(MathMLNames::mscarriesTag) || hasTagName(MathMLNames::mscarryTag) || hasTagName(MathMLNames::msgroupTag) || hasTagName(MathMLNames::mslineTag) || hasTagName(MathMLNames::msrowTag) || hasTagName(MathMLNames::mstackTag);
This might be more efficient, if called frequently, if it's a lazily constructed static hash.
My personal opinion is that it might be cleaner to have these implementations in the cpp file
> Source/WebCore/mathml/MathMLSelectElement.cpp:117
> } else {
You should have a comment describing what the "else" case handles.
Even better you might break this if (mactionTag) / else into two methods, so
if (mactionTag)
updateSelectedActionChild()
else
updateSelectedSemanticChild();
> Source/WebCore/mathml/MathMLSelectElement.cpp:120
> + if (child) {
it looks like you already checked if this was 0 before so this is redundant
> Source/WebCore/mathml/MathMLSelectElement.cpp:122
> + bool firstChildIsAnnotation = child->isMathMLElement() && toMathMLElement(child)->isSemanticAnnotation();
should you output something for the inspector to display if this is invalid?
> Source/WebCore/mathml/MathMLSelectElement.cpp:128
> + if (!child->isMathMLElement() && toMathMLElement(child)->isPresentationMathML())
it seems like you're duplicating the checks for the first child. Can you roll the initial checks into the for loop?
> Source/WebCore/mathml/MathMLSelectElement.cpp:132
> + // If the <annotation> element has an src attribute we ignore it.
why do you ignore it if it has a src attribute? (can you comment explain why here instead of what)
> Source/WebCore/mathml/MathMLSelectElement.cpp:141
> + // If the <annotation-xml> element has an src attribute we ignore it.
ditto for comment
> Source/WebCore/mathml/MathMLSelectElement.cpp:149
> + // - Other mime Content-Types for SVG and HTML
these sentences should end in a period.
I believe mime should be MIME
> Source/WebCore/mathml/MathMLSelectElement.cpp:152
> + String value = child->getAttribute(MathMLNames::encodingAttr);
this should be getFastAttribute, which I believe returns an AtomicString

(In reply to comment #22)
> > LayoutTests/mathml/presentation/semantics-3.html:24
> > + <p>annotation-xml 7: <math><semantics><csymbol>Content MathML</csymbol><annotation-xml encoding="unknown"><mtext>error</mtext></annotation-xml><annotation-xml encoding="application/mathml-presentation+xml"><mtext>annotation-xml</mtext></annotation-xml><annotation>error</annotation></semantics></math></p>
>
> can you add a case for "application/mathml+xml" since you explicitly called that out as something we should not support
added in semantics-xml 7. BTW, I also added a new test to verify the dynamic update of the encoding. I had tested it but forgot to write a unit test. I'll look into the similar SVG bug with <switch> while I'm on this...
>
> > Source/WebCore/mathml/MathMLElement.h:57
> > + return hasTagName(MathMLNames::mtrTag) || hasTagName(MathMLNames::mtdTag) || hasTagName(MathMLNames::maligngroupTag) || hasTagName(MathMLNames::malignmarkTag) || hasTagName(MathMLNames::mencloseTag) || hasTagName(MathMLNames::mglyphTag) || hasTagName(MathMLNames::mlabeledtrTag) || hasTagName(MathMLNames::mlongdivTag) || hasTagName(MathMLNames::mpaddedTag) || hasTagName(MathMLNames::msTag) || hasTagName(MathMLNames::mscarriesTag) || hasTagName(MathMLNames::mscarryTag) || hasTagName(MathMLNames::msgroupTag) || hasTagName(MathMLNames::mslineTag) || hasTagName(MathMLNames::msrowTag) || hasTagName(MathMLNames::mstackTag);
>
> This might be more efficient, if called frequently, if it's a lazily constructed static hash.
>
> My personal opinion is that it might be cleaner to have these implementations in the cpp file
>
I think this can be refined later. At the moment isPresentationMathML is only called for semantics and in most cases this will only be for the derived classes which returns true immediately.
> > Source/WebCore/mathml/MathMLSelectElement.cpp:122
> > + bool firstChildIsAnnotation = child->isMathMLElement() && toMathMLElement(child)->isSemanticAnnotation();
>
> should you output something for the inspector to display if this is invalid?
Probably. I think until we restore the standard HTML5 version with <mtext>, it's probably not a good idea to deprecate that.

(In reply to comment #28)
> This method should still be broken into sub-methods
Ah, I was not sure if it was a suggestion or a request...
> looks like a real GTK build failure
So I'll probably try to move the isPresentationMathML implementation in the cpp file. I'm not sure why this failure didn't appear in the previous patch, though.