2.1 MathML Syntax and Grammar

2.1.1 MathML Syntax and Grammar

MathML is an application of [XML], or Extensible
Markup Language, and as such its syntax is governed by the rules of
XML syntax. XML syntax is a notation for rooted labeled planar trees.
Planarity means that the children of a node may be viewed as given a
natural order.
The grammar of MathML is in part specified by a DTD, or Document
Type Definition or alternatively by an XML Schema. In other words, the
details of using tags,
attributes, entity references and the like are defined in the XML
language specification, and the details about MathML element and
attribute names, which elements can be nested inside each other, and
their possible relationships are specified in the MathML DTD. This is in Appendix A Parsing MathML.

The W3C, in seeking to increase the flexibility of the use of XML
for the Web, and to encourage modularization of applications built
with XML, found that the basic form of a DTD was not sufficiently
flexible. Therefore, a W3C Working Group was created to develop a
specification for XML Schemas [XMLSchemas], which are
specification documents that will eventually supersede DTDs.
Thus, there is a schema for MathML.

However, MathML also specifies some syntax and grammar rules in addition
to the general rules it inherits as an XML application. These rules allow
MathML to encode a great deal more information than would ordinarily be
possible with pure XML, without introducing many more elements, and using a
substantially more complex DTD or schema. A grammar for content markup
expressions is given in Appendix B Content Markup Validation Grammar. Of course, one drawback to
using MathML specific rules is that they are invisible to generic XML
processors and validators.

There are basically two kinds of additional MathML grammar and syntax
rules. One kind involves placing additional criteria on attribute values.
For example, it is not possible in pure XML to require that an attribute
value be a positive integer. The second kind of rule specifies more
detailed restrictions on the child elements (for example on ordering) than
are given in the DTD or even a schema. For example, it is not possible in
XML to specify that the first child be interpreted one way, and the second
in another.

The following sections discuss features both of XML syntax and grammar
in general, and of MathML in particular. Throughout the remainder of the
MathML specification, we will usually take care to distinguish between
usage required by XML syntax and the MathML DTD (and schema) and usage
required by MathML specific rules. However, we will frequently allude to
"MathML errors" without identifying which part of the
specification is being violated.

2.1.2 Children versus Arguments

Many MathML elements require a specific number of children or
attach additional meanings to child elements in certain positions. As noted
above, these kinds of requirements are MathML specific, and cannot be
given entirely using XML syntax and grammar. When the children of a
given MathML element are subject to these kinds of additional
conditions, we will often refer to them as arguments
instead of merely as children, in order to emphasize their MathML
specific usage. Note that, especially in Chapter 3 Presentation Markup, the
term "argument" is usually used in this technical sense,
unless otherwise noted, and therefore refers to a child element.

In the detailed discussions of element syntax given with each
element throughout the MathML specification, the number of required
arguments and their order is implicitly indicated by giving names for
the arguments at various positions. This information is also given for
presentation elements in the table of argument requirements in
Section 3.1.3 Required Arguments, and for content elements in Appendix B Content Markup Validation Grammar.

A few elements have other requirements on the number or
type of arguments. These additional requirements are described
together with the individual elements.

2.1.3 MathML Attribute Values

An XML attribute's value, which in general in MathML can be a string of
arbitrary characters, must be surrounded by a pair of either double quotes
(") or single quotes ('). The kind of quotes
not used to surround the value may be included within it.
Attribute names are generally shown in a monospaced font within descriptive text in this
specification, just as the monospaced font is used
for examples.

MathML uses a more complicated syntax for attribute values than the
generic XML syntax required by the MathML DTD. These additional rules are
intended for use by MathML applications, and it is a MathML error to
violate them, though they cannot be enforced by XML processing. The MathML
syntax of each attribute value is specified in the table of attributes
provided with the description of each element, using a notation described
below. When MathML applications process attribute values, whitespace
is ignored except to separate letter and digit sequences into
individual words or numbers. Attribute values may contain any MathML
characters as specified in Section 6.2 MathML Characters permitted by the syntax
restrictions for an attribute. Character data can be
included directly in attribute values, or by using entity references
as described in Section 6.2.1 Unicode Character Data.

In particular, the characters ", ',
& and < can be included in MathML
attribute values (when permitted by the attribute value syntax) using the
entity references &quot;, &apos;,
&amp; and &lt;, respectively.

The MathML DTD provided in Appendix A Parsing MathML declares most
attribute value types as CDATA strings. This permits increased
interoperability with existing SGML and XML software and allows extensions
to the lists of predefined values. Similar sorts of considerations apply
with XML schemas.

2.1.3.1 Syntax notations used in the MathML specification

To describe the MathML-specific syntax of permissible
attribute values, the following conventions and notations are
used for most attributes in the present document.

Notation

What it matches

number

decimal integer or rational number (a string of digits
with one decimal point),
optionally starting with '-'

RGB color value; the three pairs of
hexadecimal digits in the example #5599dd define proportions
of red, green and blue on a scale of x00 through xFF, which
gives a strong sky blue.

h-unit

unit of horizontal length (allowable units are listed below)

v-unit

unit of vertical length (allowable units are listed below)

css-fontfamily

explained in the CSS subsection below

css-color-name

explained in the CSS subsection below

other italicized words

explained in the text for each attribute

form +

one or more instances of 'form'

form *

zero or more instances of 'form'

f1 f2 ... fn

one instance of each form, in sequence, perhaps separated by whitespace

f1 | f2 | ... | fn

any one of the specified forms

[ form ]

an optional instance of 'form'

( form )

same as form

word in plain text

that word, literally present in the attribute value (unless it is
obviously part of an explanatory phrase)

quoted symbol

that symbol, literally present in attribute value (e.g. "+" or '+')

The order of precedence of the syntax notation operators is,
from highest to lowest precedence:

form + or form *

f1 f2 ... fn (sequence of forms)

f1 | f2 | ... | fn (alternative forms)

A string can contain arbitrary characters which are
specifiable within XML CDATA attribute values. See Chapter 6 Characters, Entities and Fonts for a full discussion of MathML
characters. No syntax rule in MathML includes a string
as only part of an attribute value; a string can only be the entire value.

Adjacent keywords and numbers must be separated by whitespace in the
actual attribute values, except for unit identifiers (denoted by h-unit
or v-unit syntax symbols)
following numbers. Whitespace is not otherwise required, but is permitted
between any of the tokens listed above, except (for compatibility with
CSS) immediately before unit identifiers, between the '-' signs and digits
of negative numbers, or between # and "rrggbb"
or "rgb".

Numerical attribute values for dimensions that should depend upon the
current font can be given in font-related units, or in named absolute units
(described in a separate subsection below). Horizontal dimensions are
conventionally given in em's, and vertical dimensions in
ex's, by immediately following a number by one of the unit
identifiers "em" or "ex". For
example, the horizontal spacing around an operator such as "+"
is conventionally given in "em"s, though other units
can be used. Using font-related units is usually preferable to using
absolute units, since it allows renderings to grow or shrink in proportion
to the current font size.

For most numerical attributes, only those in a subset of the expressible
values are sensible; values outside this subset are not errors, unless
otherwise specified, but rather are rounded up or down (at the discretion
of the renderer) to the closest value within the allowed subset. The set
of allowed values may depend on the renderer, and is not specified by
MathML.

If a numerical value within an attribute value syntax description is
declared to allow a minus sign ('-'), e.g. number or
integer, it is not a syntax error when one is
provided in cases where a negative value is not sensible. Instead, the
value should be handled by the processing application as described in the
preceding paragraph. An explicit plus sign ('+') is not allowed as part of
a numerical value except when it is specifically listed in the syntax (as a
quoted '+' or "+"), and its presence can change the meaning of the
attribute value (as documented with each attribute which permits it).

The symbols h-unit, v-unit, css-fontfamily,
and css-color-name are explained in the
following subsections.

2.1.3.2 Attributes with units

Some attributes accept horizontal or vertical lengths as numbers
followed by a "unit identifier" (often just called a
"unit"). The syntax symbols h-unit and
v-unit refer to a unit for horizontal or vertical
length, respectively. The possible units and the lengths they refer to are
shown in the table below; they are the same for horizontal and vertical
lengths, but the syntax symbols are distinguished in attribute syntaxes as
a reminder of the direction each is used in.

The unit identifiers and meanings are taken from CSS. However, the
syntax of numbers followed by unit identifiers in MathML is not identical
to the syntax of length values with units in CSS style sheets, since
numbers in CSS cannot end with decimal points, and are allowed to start
with '+' signs.

The possible horizontal or vertical units in MathML are:

Unit identifier

Unit description

em

em (font-relative unit traditionally used for horizontal lengths)

ex

ex (font-relative unit traditionally used for vertical lengths)

px

pixels, or size of a pixel in the current display

in

inches (1 inch = 2.54 centimeters)

cm

centimeters

mm

millimeters

pt

points (1 point = 1/72 inch)

pc

picas (1 pica = 12 points)

%

percentage of default value

The typesetting units "em" and "ex"
are defined in Appendix F Glossary, and
discussed further under "Additional notes" below.

% is a "relative unit"; when an attribute
value is given as "n%" (for any numerical value
"n"), the value being specified is the default value for
the property being controlled multiplied by "n"
divided by 100. The default value (or the way in which it is obtained, when
it is not constant) is listed in the table of attributes for each element,
and its meaning is described in the subsequent documentation about that
attribute. (The mpadded element has its own syntax
for % and does not allow it as a unit identifier.)

For consistency with CSS, length units in MathML are rarely
optional. When they are, the unit symbol is enclosed in square brackets in
the attribute syntax, following the number to which it applies,
e.g. number [ h-unit ]. The meaning of specifying no unit is
given in the documentation for each attribute; in general it is that the
number given is a multiplier for the default value of the attribute. (In
such cases, specifying the number "nnn" without a unit
is equivalent to specifying the number "nnn" times 100
followed by %. For example, <mo maxsize="2"> (
</mo> is equivalent to <mo maxsize="200%"> (
</mo>.)

As a special exception (also consistent with CSS), a numerical value
equal to 0 need not be followed by a unit identifier even if the syntax
specified here requires one. In such cases, the unit identifier (or lack of
one) would not matter, since 0 times any unit is 0.

For most attributes, the typical unit which would be used to describe
them in typesetting is chosen as the one used in that attribute's default
value in this specification; when a specific default value is not given,
the typical unit is usually mentioned in the syntax table or in the
documentation for that attribute. The most common units are em
or ex. However, any unit can be used, unless otherwise
specified for a specific attribute.

2.1.3.2.1 Additional notes about units

Note that some attributes, e.g. framespacing
on a <mtable>,
can contain more than one numerical value, each followed by its own
unit.

It is conventional to use the font-relative unit ex mainly
for vertical lengths, and em mainly for horizontal lengths,
but this is not required. These units are relative to the font and font size
which would be used for rendering the element in whose attribute value they
are specified, which means they should be interpreted after
attributes such as fontfamily and fontsize are processed,
if those occur on the same
element, since changing the current font or font size can change the length
of one of these units.

The definition of the length of each unit, but not the MathML syntax for
length values, is as specified in CSS, except that if a font provides
specific values for em and ex which differ from
the values defined by CSS (the font size and "x"-height
respectively), those values should be used.

2.1.3.3 CSS-compatible attributes

Several MathML attributes, listed below, correspond closely to text
rendering properties defined originally in [CSS1].
In MathML 1.01, the names and values of these attributes were aligned
with the CSS Recommendation where possible. This was done so that
renderers in CSS environments could query the environment for the
corresponding property when determining the default values for the
attributes.

Allowing style properties to be set both via MathML attributes and
CSS style sheets has drawbacks. At a minimum, duplication is confusing, and at
worst, it leads to the meaning of equations being inadvertently
changed by document-wide CSS changes. For these reasons, these
attributes have been deprecated.
In their place, MathML 2.0 introduces four new mathematical style
attributes. These attributes use logical values to better capture the
abstract categories of letter-like symbols used in math, and afford a
much cleaner separation between MathML and CSS. See Section 3.2.2 Mathematics style attributes common to token
elements for more details.

For reference, a table showing the correspondence of the deprecated
MathML 1.01 style attributes with their CSS counterparts is given below:

2.1.3.3.1 Order of processing attributes versus style sheets

CSS or analogous style sheets can specify changes to rendering
properties of selected MathML elements. Since rendering properties
can also be changed by attributes on an element, or be changed automatically
by the renderer, it is necessary to specify the order in which changes
requested by various sources should occur. An example of automatic
adjustment is what
happens for fontsize, as explained in the
discussion on scriptlevel in Section 3.3.4 Style Change (mstyle). In the case of "absolute" changes,
i.e., setting a new property value independent of the old value (as
opposed to "relative" changes, such as increments or
multiplications by a factor), the absolute change performed last will
be the only absolute change which is effective, so the sources of
changes which should have the highest priority must be processed
last.

In the case of CSS, the order of processing of changes
from various sources which affect one MathML element's
rendering properties should be as follows:

(first changes; lowest priority)

Automatic changes to properties or attributes based on the type of the
parent element, and this element's position in the parent, as for the
changes to fontsize in relation to scriptlevel mentioned above; such changes will usually
be implemented by the parent element itself before it passes a set of
rendering properties to this element

From a style sheet from the reader: styles which are not
declared "important"

Explicit attribute settings on this MathML element

From a style sheet from the author: styles which are not
declared "important"

From a style sheet from the author: styles which are
declared "important"

From a style sheet from the reader: styles which are
declared "important"

(last changes; highest priority)

Note that the order of the changes derived from CSS style sheets is
specified by CSS itself (this is the order specified by CSS2).
The following rationale is related only to the
issue of where in this pre-existing order the changes caused by explicit
MathML attribute settings should be inserted.

Rationale: MathML rendering attributes are analogous to HTML rendering
attributes such as align, which the CSS section on
cascading order specifies should be processed with the same priority.
Furthermore, this choice of priority permits readers, by declaring certain
CSS styles as "important", to decide which of their style
preferences should override explicit attribute settings in MathML. Since
MathML expressions, whether composed of "presentation" or
"content" elements, are primarily intended to convey meaning,
with their "graphic design" (if any) intended mainly to aid in
that purpose but not to be essential in it, it is likely that readers will
often want their own style preferences to have priority; the main exception
will be when a rendering attribute is intended to alter the meaning
conveyed by an expression, which is generally discouraged in the
presentation attributes of MathML.

2.1.3.4 Default values of attributes

Default values for MathML attributes are in general given along with the
detailed descriptions of specific elements in the text. Default values
shown in plain text in the tables of attributes for an element are literal
(unless they are obviously explanatory phrases), but when italicized are
descriptions of how default values can be computed.

Default values described as inherited are taken from the
rendering environment, as described under mstyle,
or in some cases (described individually) from the values of other
attributes of surrounding elements, or from certain parts of those
values. The value used will always be one which could have been specified
explicitly, had it been known; it will never depend on the content or
attributes of the same element, only on its environment. (What it means
when used may, however, depend on those attributes or the content.)

Default values described as automatic should be computed by
a MathML renderer in a way which will produce a high-quality rendering; how
to do this is not usually specified by the MathML specification. The value
computed will always be one which could have been specified explicitly, had
it been known, but it will usually depend on the element content and
possibly on the rendering environment.

Other italicized descriptions of default values which appear in the
tables of attributes are explained for each attribute individually.

The single or double quotes which are required around attribute values
in an XML start tag are not shown in the tables of attribute value syntax
for each element, but are shown around example attribute values in the
text.

Note that, in general, there is no value which can be given explicitly
for a MathML attribute which will simulate the effect of not specifying the
attribute at all for attributes which are inherited or
automatic. Giving the words "inherited" or
"automatic" explicitly will not work, and is not generally
allowed. Furthermore, even for presentation attributes for which a
specific default value is documented here, the mstyle element
(Section 3.3.4 Style Change (mstyle)) can be
used to change this for the elements it contains. Therefore, the MathML DTD
declares most presentation attribute default values as #IMPLIED,
which prevents XML preprocessors from adding them with any specific default
value. This point of view is carried through to the MathML schema.

2.1.3.5 Attribute values in the MathML DTD

In an XML DTD, allowed attribute values can be declared as general
strings, or they can be constrained in various ways, either by enumerating
the possible values, or by declaring them to be certain special data
types. The choice of an XML attribute type affects the extent to which
validity checks can be performed using a DTD.

The MathML DTD specifies formal XML attribute types for all MathML
attributes, including enumerations of legitimate values in some cases. In
general, however, the MathML DTD is relatively permissive, frequently
declaring attribute values as strings; this is done to provide for
interoperability with SGML parsers while allowing multiple attributes on
one MathML element to accept the same values (such as "true"
and "false"), and also to
allow extension to the lists of predefined values.

At the same time, even though an attribute value may be declared as a
string in the DTD, only certain values are legitimate in MathML, as
described above and in the rest of this specification. For example, many
attributes expect numerical values. In the sections which follow, the
allowed attribute values are described for each element. To determine when
these constraints are actually enforced in the MathML DTD, consult Appendix A Parsing MathML. However, lack of enforcement of a requirement in the DTD
does not imply that the requirement is not part of the MathML
language itself, or that it will not be enforced by a particular MathML
renderer. (See Section 2.3.2 Handling of Errors for a description of how
MathML renderers should respond to MathML errors.)

Furthermore, the MathML DTD is provided for convenience; although it is
intended to be fully compatible with the text of the specification, the
text should be taken as definitive if there is a contradiction. (Any
contradictions which may exist between various chapters of the text should
be resolved by favoring Chapter 6 Characters, Entities and Fonts first, then Chapter 3 Presentation Markup, Chapter 4 Content Markup, then Section 2.1 MathML Syntax and Grammar,
and then other parts of the text.) For the MathML schema the situation
will be the same: the published Recommendation text takes precedence.
Though this is what is intended to happen, there is a practical difficulty.
If the system processing the MathML uses a validating parser, whether it be
based on a DTD or on a schema, the process will probably simply stop when
it hits something held to be incorrect syntax, whether or not further
MathML processing in full harmony with the specification would have
processed the piece correctly.

2.1.4 Attributes Shared by all MathML Elements

In order to facilitate use with style sheet mechanisms such as
[XSLT] and [CSS2]
all MathML elements accept class, style, and id
attributes in addition to the attributes described
specifically for each element. MathML renderers not supporting CSS may
ignore these attributes. MathML specifies these attribute values as general
strings, even if style sheet mechanisms have more restrictive syntaxes for
them. That is, any value for them is valid in MathML.

In order to facilitate compatibility with linking mechanisms, all
MathML elements accept the xlink:href
attribute.

All MathML elements also accept the xref
attribute for use in parallel markup (Section 5.5 Parallel Markup). The id is also used
in this context.

Every MathML element, because of a legacy from MathML 1.0, also
accepts the deprecated attribute
other (Section 2.3.3 Attributes for unspecified data)
which was conceived for passing non-standard attributes without
violating the MathML DTD. MathML renderers are only required to
process this attribute if they respond to any attributes which are not
standard in MathML. However, the use of other
is strongly discouraged when there are already other ways
within MathML of passing specific information.

2.1.5 Collapsing Whitespace in Input

MathML ignores whitespace occurring outside token elements.
Non-whitespace characters are not allowed there. Whitespace occurring
within the content of token elements is "trimmed" from the
ends, i.e., all whitespace at the beginning and end of the content is
removed. Whitespace internal to content of MathML elements is
"collapsed" canonically, i.e., each sequence of 1 or more
whitespace characters is replaced with one space character (sometimes
called a blank character).

Authors wishing to encode whitespace characters at the start or end of
the content of a token, or in sequences other than a single space, without
having them ignored, must use &nbsp; or other
"whitespace" non-marking entities as described in Section 6.2.4 Non-Marking Characters. For example, compare

When the first example is rendered, there is no whitespace before
"Theorem", one space between "Theorem" and
"1:", and no whitespace after "1:". In the
second example, a single space is rendered before
"Theorem", two spaces are rendered before
"1:", and there is no whitespace after the
"1:".

Note that the xml:space attribute does not apply
in this situation since XML processors pass whitespace in tokens to a
MathML processor; it is the MathML processing rules which specify that
whitespace is trimmed and collapsed.

For whitespace occurring outside the content of the token elements
mi, mn, mo, ms, mtext,
ci, cn and annotation, an mspace
element should be used, as opposed to an mtext element containing
only "whitespace"
entities.

2.2 Interfacing [from ch 7]

The current chapter describes MathML2, it will need to be updated
in later drafts.

Resolution

None recorded

To be effective, MathML must work well with a wide variety of
renderers, processors, translators and editors. This chapter addresses
some of the interface issues involved in generating and rendering
MathML. Since MathML exists primarily to encode mathematics in Web
documents, perhaps the most important interface issues are related to
embedding MathML in [HTML4] and
[XHTML].

There are three kinds of interface issues that arise in embedding
MathML in other XML documents. First, MathML must be semantically
integrated. MathML markup must be recognized as valid embedded XML
content, and not as an error. This is primarily a question of
managing namespaces in XML [Namespaces].

Second, in the case of HTML/XHTML, MathML rendering must be
integrated into browser software. Some browsers already implement
MathML rendering natively, and one can expect more browsers will do so
in the future. At the same time, other browsers have developed
infrastructure to facilitate the rendering of MathML and other
embedded XML content by third-party software. Using these browser
specific mechanisms generally requires some additional interface
markup of some sort to activate them.

Third, other tools for generating and processing MathML must be
able to intercommunicate. A number of MathML tools have been or are
being developed, including editors, translators, computer algebra
systems, and other scientific software. However, since MathML
expressions tend to be lengthy, and prone to error when entered by
hand, special emphasis must be given to insuring that MathML can be
easily generated by user-friendly conversion and authoring tools, and
that these tools work together in a dependable, platform and vendor
independent way.

The W3C Math Working Group is committed to providing support to software
vendors developing any kind of MathML tool. The Working Group monitors
the public mailing list
www-math@w3.org,
and will attempt to answer questions about the MathML specification. The
Working Group works with MathML developer
and user groups. For current information about MathML tools, applications
and user support activities, consult the
home page of the W3C Math Working
Group.

2.3 Conformance

Information is increasingly generated, processed and rendered by
software tools. The exponential growth of the Web is fueling the
development of advanced systems for automatically searching,
categorizing, and interconnecting information. Thus, although MathML
can be written by hand and read by humans, the future of MathML is
largely tied to the ability to process it with software tools.

There are many different kinds of MathML
processors: editors for authoring MathML expressions, translators for
converting to and from other encodings, validators for checking MathML
expressions, computation engines that evaluate, manipulate or compare
MathML expressions, and rendering engines that produce visual, aural
or tactile representations of mathematical notation. What it
means to support MathML varies widely between applications. For
example, the issues that arise with a validating
parser are very different from those for a equation
editor.

In this section, guidelines are given for describing different types
of MathML support, and for quantifying the extent of MathML support in
a given application. Developers, users and reviewers are encouraged
to use these guidelines in characterizing products. The intention
behind these guidelines is to facilitate reuse and interoperability
between MathML applications by accurately characterizing their
capabilities in quantifiable terms.

2.3.1 MathML Conformance

A valid MathML expression is an XML construct determined by the MathML
DTD together with the additional requirements given in this specification.

Define a "MathML processor" to mean any application that
can accept, produce, or "roundtrip" a valid MathML
expression. An example of an application that might round-trip a MathML
expression might be an editor that writes a new file even though no
modifications are made.

Three forms of MathML conformance are specified:

A MathML-input-conformant processor must accept all valid MathML
expressions, and faithfully translate all MathML expressions into
application-specific form allowing native application operations to be
performed.

A MathML-roundtrip-conformant processor must preserve MathML
equivalence. Two MathML expressions are "equivalent" if and
only if both expressions have the same interpretation (as stated by the
MathML DTD and specification) under any circumstances, by any MathML
processor. Equivalence on an element-by-element basis is discussed
elsewhere in this document.

Beyond the above definitions, the MathML specification makes no demands
of individual processors. In order to guide developers, the MathML
specification includes advisory material; for example, there are many suggested
rendering rules throughout Chapter 3 Presentation Markup. However, in general,
developers are given wide latitude in interpreting what kind of MathML
implementation is meaningful for their own particular application.

To clarify the difference between conformance and interpretation of
what is meaningful, consider some examples:

In order to be MathML-input-conformant, a validating parser needs
only to accept expressions, and return "true" for
expressions that are valid MathML. In particular, it need not render or
interpret the MathML expressions at all.

A MathML computer-algebra interface based on content markup might
choose to ignore all presentation markup. Provided the interface accepts
all valid MathML expressions including those containing presentation markup,
it would be technically correct to characterize the application as
MathML-input-conformant.

An equation editor might have an internal data representation
that makes it easy to export some equations as MathML but not
others. If the editor exports the simple equations as valid MathML,
and merely displays an error message to the effect that conversion
failed for the others, it is still technically
MathML-output-conformant.

2.3.1.1 MathML Test Suite and Validator

As the previous examples show, to be useful, the concept of MathML
conformance frequently involves a judgment about what parts of the
language are meaningfully implemented, as opposed to parts that are
merely processed in a technically correct way with respect to the
definitions of conformance. This requires some mechanism for giving a
quantitative statement about which parts of MathML are meaningfully
implemented by a given application. To this end, the W3C Math Working
Group has provided a test
suite.

The test suite consists of a large number of MathML expressions
categorized by markup category and dominant MathML element being
tested. The existence of this test suite makes it possible, for example,
to characterize quantitatively the hypothetical computer algebra interface
mentioned above by saying that it is a MathML-input-conformant processor
which meaningfully implements MathML content markup, including all of
the expressions in the content markup section of the test suite.

Developers who choose not to implement parts of the MathML
specification in a meaningful way are encouraged to itemize the parts
they leave out by referring to specific categories in the test suite.

For MathML-output-conformant processors, there is also a MathML validator
accessible over the Web. Developers of MathML-output-conformant
processors are encouraged to verify their output using this
validator.

Customers of MathML applications who wish to verify claims as to which
parts of the MathML specification are implemented by an application are
encouraged to use the test suites as a part of their decision
processes.

2.3.1.2 Deprecated MathML 1.x Features

MathML 2.0 contains a number of MathML 1.x features which are now
deprecated. The following points define what it means for a
feature to be deprecated, and clarify the relation between
deprecated features and MathML 2.0 conformance.

In order to be MathML-output-conformant, authoring tools may not
generate MathML markup containing deprecated features.

In order to be MathML-input-conformant, rendering/reading
tools must support deprecated features if they are to be
in conformance with MathML 1.x. They do not have to support deprecated
features to be considered in conformance with MathML 2.0. However, all tools
are encouraged to support the old forms as much as
possible.

In order to be MathML-roundtrip-conformant, a processor need
only preserve MathML equivalence on expressions containing no
deprecated features.

2.3.1.3 MathML 2.0 Extension Mechanisms and Conformance

MathML 2.0 defines three extension mechanisms: The mglyph
element provides a way of displaying glyphs for non-Unicode
characters, and glyph variants for existing Unicode characters; the
maction element uses attributes from other namespaces to obtain
implementation-specific parameters; and content markup makes use of
the definitionURL attribute to point to external
definitions of mathematical semantics.

These extension mechanisms are important because they provide a way
of encoding concepts that are beyond the scope of MathML 2.0, which
allows MathML to be used for exploring new ideas not yet susceptible
to standardization. However, as new ideas take hold, they may become
part of future standards. For example, an emerging character that
must be represented by an mglyph element today may be
assigned a Unicode codepoint in the future. At that time,
representing the character directly by its Unicode codepoint would be
preferable.

Because the possibility of future obsolescence is inherent in the
use of extension mechanisms to facilitate the discussion of new ideas,
MathML 2.0 makes no conformance requirement concerning the use of
extension mechanisms, even when alternative standard markup is
available. For example, using an mglyph element to represent
an 'x' is permitted. However, authors and implementors are
strongly encouraged to use standard markup whenever possible.
Similarly, maintainers of documents employing MathML 2.0 extension
mechanisms are encouraged to monitor relevant standards activity
(e.g. Unicode, OpenMath, etc) and update documents as more
standardized markup becomes available.

2.3.2 Handling of Errors

If a MathML-input-conformant application receives input containing one or
more elements with an illegal number or type of attributes or child
schemata, it should nonetheless attempt to render all the input in an
intelligible way, i.e. to render normally those parts of the input that
were valid, and to render error messages (rendered as if enclosed in
an merror
element) in place of invalid expressions.

MathML-output-conformant applications such as editors and translators may
choose to generate merror expressions to signal
errors in their input. This is usually preferable to generating valid, but
possibly erroneous, MathML.

2.3.3 Attributes for unspecified data

The MathML attributes described in the MathML specification are
necessary for presentation and content markup. Ideally, the MathML
attributes should be an open-ended list so that users can add specific
attributes for specific renderers. However, this cannot be done within
the confines of a single XML DTD. Although it can be done using
extensions of the standard DTD, some authors will wish to use
non-standard attributes to take advantage of renderer-specific
capabilities while remaining strictly in conformance with the standard
DTD.

To allow this, the MathML 1.0 specification [MathML1]
allowed the attribute other on all elements, for use as a hook to pass
on renderer-specific information. In particular, it was intended as a hook for
passing information to audio renderers, computer algebra systems, and for pattern
matching in future macro/extension mechanisms. The motivation for this approach to
the problem was historical, looking to PostScript, for example, where comments are
widely used to pass information that is not part of PostScript.

In the meantime, however, the development of a general XML namespace
mechanism has made the use of the other
attribute obsolete. In MathML 2.0, the other
attribute is deprecated
in favor of the use of namespace
prefixes to identify non-MathML attributes.

For example, in MathML 1.0, it was recommended that if additional information
was used in a renderer-specific implementation for the maction element
(Section 3.6.1 Bind Action to Sub-Expression
(maction)),
that information should be passed in using the other attribute:

Note that the intent of allowing non-standard attributes is
not to encourage software developers to use this as a
loophole for circumventing the core conventions for MathML markup.
Authors and applications should use non-standard attributes
judiciously.

2.4 Future Extensions

If MathML is to remain useful in the future, it is to be expected
that MathML will need to be extended and revised in various ways. Some
of these extensions can be easily foreseen; for example, as work on
behavioral extensions to CSS proceeds, MathML will likely need to be
extended as well.

Similarly, there are several kinds of functionality that are fairly
obvious candidates for future MathML extensions. These include macros,
style sheets, and perhaps a general facility for "labeled
diagrams". However, there will no doubt be other desirable
extensions to MathML that will only emerge as MathML is widely used. For
these extensions, the W3C Math Working Group relies on the extensible
architecture of XML, and the common sense of the larger Web community.

2.4.1 Macros and Style Sheets

The development of style-sheet mechanisms for XML is part of the ongoing
XML activity of the World Wide Web Consortium. Both XSL and CSS are working
to incorporate greater support for mathematics.

In particular, XSL Transformations [XSLT] are likely
to have a large impact on the future development of MathML. Macros
have traditionally contributed greatly the usability and effectiveness
of mathematics encodings. Further work developing applications of
XSLT tailored specifically to MathML is clearly called for.

Some of the possible uses of macro capabilities for MathML include:

Abbreviation

One common use of macros is for
abbreviation. Authors needing to repeat some complicated but constant
notation can define a macro. This greatly facilitates hand authoring.
Macros that allow for substitution of parameters facilitate such usage
even further.

Extension of Content Markup

By defining macros for semantic objects, for example a binomial
coefficient, or a Bessel function, one can in effect extend the content
markup for MathML. Such a macro could include an explicit semantic
binding, or such a binding could be easily added by an external
application. Narrowly defined disciplines should be able to easily
introduce standardized content markup by using standard macro packages. For
example, the OpenMath project could release macro packages for attaching
OpenMath content markup.

Rendering and Style Control

Another basic way in which macros are often used is to provide a
way of controlling style and rendering behavior by replacing high-level
macro definitions. This is especially important for controlling the
rendering behavior of MathML content tags in a context sensitive way. Such
a macro capability is also necessary to provide a way of attaching
renderings to user-defined XML extensions to the MathML core.

Accessibility

Reader-controlled style sheets are important in providing
accessibility to MathML. For example, a reader listening to a voice
renderer might, by default, hear a bit of MathML presentation markup read as
"D sub x sup 2 of f". Knowing the context to be multi-variable
calculus, the reader may wish to use a style sheet or macro package that
instructs the renderer to render this <msubsup>
element as "second derivative with respect to x of f".

2.4.2 XML Extensions to MathML

The set of elements and attributes specified in the MathML
specification are necessary for rendering common mathematical expressions.
It is recognized that not all mathematical notation is covered by this
set of elements, that new notations are continually invented, and that
sub-communities within mathematics often have specialized notations;
and furthermore that the explicit extension of a standard is a
necessarily slow and conservative process. This implies that the
MathML standard could never explicitly cover all the presentational
forms used by every sub-community of authors and readers of
mathematics, much less encode all mathematical content.

In order to facilitate the use of MathML by the widest possible
audience, and to enable its smooth evolution to encompass more
notational forms and more mathematical content (perhaps eventually
covered by explicit extensions to the standard), the set of tags and
attributes is open-ended, in the sense described in this section.

MathML is described by an XML DTD, which necessarily limits the elements
and attributes to those occurring in the DTD. Renderers desiring to accept
non-standard elements or attributes, and authors desiring to include these
in documents, should accept or produce documents that conform to an
appropriately extended XML DTD that has the standard MathML DTD
as a subset.

MathML renderers are allowed, but not required, to accept
non-standard elements and attributes, and to render them in any way. If a
renderer does not accept some or all non-standard tags, it is encouraged
either to handle them as errors as described above for elements with the
wrong number of arguments, or to render their arguments as if they were
arguments to an mrow, in either case rendering all
standard parts of the input in the normal way.

2.5 Embedding MathML in other Documents

While MathML can be used in isolation as a language for exchanging
mathematical expressions between MathML-aware applications, the
primary anticipated use of MathML is to encode mathematical expression
within larger documents. MathML is ideal for embedding math
expressions in other applications of XML.

In particular, the focus here is on the mechanics of embedding
MathML in [XHTML]. XHTML is a W3C Recommendation
formulating a family of current and future XML-based document types
and modules that reproduce, subset, and extend HTML. While [HTML4] is the dominant language of the Web at the time of
this writing, one may anticipate a shift from HTML to XHTML.
Indeed, XHTML can already be made to render properly in most HTML user
agents.

Since MathML and XHTML share a common XML framework, namespaces
provide a standard mechanism for embedding MathML in XHTML. While
some popular user agents also support inclusion of MathML directly in
HTML as "XML data islands," this is a transitional strategy.
Consult user agent documentation for specific information on
its support for embedding XML in HTML.

2.5.1 MathML and Namespaces

Embedding MathML in XML-based documents in general, and XHTML in
particular, is a matter of managing namespaces. See the W3C
Recommendation "Namespaces in XML" [Namespaces] for full details.

An XML namespace is a collection of names identified by a URI. The
URI for the MathML namespace is:

http://www.w3.org/1998/Math/MathML

Using namespaces, embedding a MathML expression in a larger XML
document is merely a matter of identifying the MathML markup as
residing in the MathML namespace. This can be accomplished by either
explicitly identifying each MathML element name by attaching a
namespace prefix, or by declaring a default namespace on an enclosing
element.

To declare a namespace, one uses an xmlns
attribute, or an attribute
with an xmlns prefix. When the xmlns attribute
is used alone, it sets
the default namespace for the element on which it
appears, and for any children elements.

These two methods of namespace declaration can be used together.
For example, by using both an explicit document-wide namespace prefix,
and default namespace declarations on individual mathematical
elements, it is possible to localize namespace related markup to the
top-level math element.

2.5.1.1 Document Validation Issues

The use of namespace prefixes creates an issue for DTD validation
of documents embedding MathML. DTD validation requires knowing the
literal (possibly prefixed) element names used in the document.
However, the Namespaces in XML Recommendation [Namespaces] allows the prefix to be changed at arbitrary points
in the document, since namespace prefixes may be declared on any
element.

The 'historical' method of bridging this gap was to write a DTD
with a fixed prefix, or in the case of XHTML and MathML, with no
prefix, and mandate that the specified form must be used throughout
the document. However, this is somewhat restricting for a modular DTD
that is intended for use in conjunction with another DTD, which is
exactly the situation with MathML in XHTML. In essence, the MathML
DTD would have to allocate a prefix for itself and hope no
other module uses the same prefix to avoid name clashes, thus losing
one of the main benefits of XML namespaces.

One strategy for addressing this problem is to make every element
name in the DTD be accessed by an entity reference. This means that by
declaring a couple of entities to specify the prefix before the DTD is
loaded, the prefix can be chosen by a document author, and compound
DTDs that include several modules can, without changing the module
DTDs, specify unique prefixes for each module to avoid clashes. The
MathML DTD has been designed in this fashion. See Section A.2.5 The MathML DTD and [Modularization] for
details.

An extra issue arises in the case where explicit prefixes are used
on the top-level math element, but a default
namespace is used for other MathML elements. In this case, one wants
the MathML module to be included into XHTML with the prefix set to
empty. However, the 'driver' DTD file that sets up the inclusion of
the MathML module would then need to define a new element called
m:math. This would allow the top-level math
element to use an explicit prefix, for attaching rendering behaviors
in current browsers, while the contents would not need an explicit
prefix, for ease of interoperability between authoring tools, etc.

2.5.1.2 Compatibility Suggestions

While the use of namespaces to embed MathML in other XML
applications is completely described by the relevant W3C
Recommendations, a certain degree of pragmatism is still called for at
present. Support for XML, namespaces and rendering behaviors in
popular user agents is not always fully in alignment with W3C
Recommendations. In some cases, the software predates the relevant
standards, and in other cases, the relevant standards are not yet
complete.

During the transitional period, in which some software may not be
fully namespace-aware, a few conventional practices will ease
compatibility problems:

When using namespace prefixes with MathML markup, use m: as a
conventional prefix for the MathML namespace. Using an explicit
prefix is probably safer for compatibility in current user agents.

When using namespace prefixes, pick one and use it
consistently within a document.

Note that these suggestions alone may not be sufficient for
creating functional Web pages containing MathML markup. It will
generally be the case that some additional document-wide markup will
be required. Additional work may also be required to make all MathML
instances in a document compatible with document-wide declarations.
This is particularly true when documents are created by cutting and
pasting MathML expressions, since current tools will probably not
be able to query global namespace information.

Consult the W3C Math Working
Group home page for compatibility and implementation suggestions
for current browsers and other MathML-aware tools.

2.5.2 The Top-Level
math Element

MathML specifies a single top-level or root math element,
which encapsulates each instance of
MathML markup within a document. All other MathML content must be
contained in a math element; equivalently,
every valid, complete MathML expression must be contained in
<math> tags. The math
element must always be the outermost element in a MathML expression;
it is an error for one math element to contain
another.

Applications that return sub-expressions of other MathML
expressions, for example, as the result of a cut-and-paste operation,
should always wrap them in <math>
tags. Ideally, the presence of enclosing <math>
tags should be a very good heuristic test for MathML
material. Similarly, applications which insert MathML expressions in
other MathML expressions must take care to remove the
<math> tags from the inner expressions.

The math element can contain an arbitrary number
of children schemata. The children schemata render by default as if they
were contained in an mrow element.

This attribute provides a way of pointing to
external macro definition files. Macros are not part of the MathML
specification, and much of the functionality provided by macros in MathML can be
accommodated by XSL transformations [XSLT]. However, the
macros attribute is provided to make possible future
development of more streamlined, MathML-specific macro mechanisms. The
value of this attribute is a sequence of URLs or URIs, separated by
whitespace

mode

The mode attribute specifies whether
the enclosed MathML expression should be rendered in a display style
or an in-line style. Allowed values are
"display" and
"inline" (default).
This attribute is deprecated in
favor of the new display attribute, or the
CSS2
'display' property with the analogous block and
inline values.

display

The display attribute replaces the
deprecated mode attribute.
It specifies whether
the enclosed MathML expression should be rendered in a display style
or an in-line style. Allowed values are
"block" and
"inline" (default).

The attributes of the math element affect
the entire enclosed expression. They are, in a sense, "inward
looking". However, to render MathML properly in a browser, and
to integrate it properly into an XHTML document, a second collection
of "outward looking" attributes are also useful.

While general mechanisms for attaching rendering behaviors to
elements in XML documents are under development, wide variations in
strategy and level of implementation remain between various existing
user agents. Consequently, the remainder of this section describes
attributes and functionality that are desirable for integrating
third-party rendering modules with user agents:

overflow

In cases where size negotiation is not possible or fails (for example in the case of an expression that is too long to fit
in the allowed width), this attribute is provided to suggest a processing method to the renderer. Allowed values are:

linebreak

(Default) The expression will be broken across several lines. The line breaking algorithm is not specified, but it is recommended
that line breaking should try to keep meaningful subexpressions together and indent lines in a manner that aids in understanding
the expression.

scroll

The window provides a viewport
into the larger complete display of the mathematical
expression. Horizontal or vertical scrollbars are added to the window
as necessary to allow the viewport to be moved to a different
position.

elide

The display is abbreviated by removing enough of it so that
the remainder fits into the window. For example, a large polynomial
might have the first and last terms displayed with "+ ... +" between
them. Advanced renderers may provide a facility to zoom in on elided
areas.

truncate

The display is abbreviated by simply truncating it at the right and
bottom borders. It is recommended that some indication of truncation is
made to the viewer.

scale

The fonts used to display the mathematical expression are
chosen so that the full expression fits in the window. Note that this
only happens if the expression is too large. In the case of a window
larger than necessary, the expression is shown at its normal size
within the larger window.

altimg

This attribute provides a graceful fall-back for browsers that do
not support embedded elements. The value of the
attribute is an URL.

alttext

This attribute provides a graceful fall-back for browsers that do
not support embedded elements or images.
The value of the attribute is a text string.

altimg-width

This attribute provides a width for the altimg (if any). The value of attribute is an h-unit. This value is useful for high resolution images what whose display at their full resolution would be too large.

altimg-height

This attribute provides a total height for the altimg (if any). The value of attribute is a v-unit. This value is useful for high resolution images what whose display at their full resolution would be too large.

altimg-baseline

This attribute specifies an amount from the top of the image to the baseline of the altimg (if any). The value of attribute is a v-unit. This value is useful for aligning the baseline of math in inline images with the baseline of the surrounding text.