4.1 Introduction

In MathML3, content markup is divided into two subsets "Strict"-
and "Pragmatic" Content MathML. The first subset uses a minimal
set of elements representing the meaning of a mathematical expression in a uniform
structure, while the second one tries to strike a pragmatic balance between
verbosity and formality. Both forms of content Expressions are legitimate and have
their role in representing mathematics. Strict Content MathML is canonical in a
sense and simplifies the implementation of content MathML processors and the
comparison of content expressions and Pragmatic Content MathML is much simpler and
more intuitive for humans to understand, read, and write.

Strict Content MathML3 expressions can directly be given a formal semantics in terms
of "OpenMath Objects" [OpenMath2004], and we interpret
Pragmatic Content MathML3 expressions by specifying equivalent Strict variants, so
that they inherit their semantics.

Editorial note: MiKo

We are using the notions of "Strict" and "Pragmatic"
Content MathML in this working draft, even though they do not fully convey the
intention of the representations choices. However, they carry the intuition much
better than the terms "canonical" and "legacy" we used
before, since they are less judgmental.

4.2 Strict Content MathML

4.2.1 The structure of MathML3 Content Expressions

MathML content encoding is based on the concept of an expression tree built up from

As a general rule, the terminal nodes in the tree represent basic mathematical objects
such as numbers, variables, arithmetic operations and so on. The internal nodes in the
tree generally represent some kind of function application or other mathematical
construction that builds up a compound object. Function application provides the most
important example; an internal node might represent the application of a function to
several arguments, which are themselves represented by the terminal nodes underneath the
internal node.

This section provides the basic XML Encoding of content MathML expression trees.
General usage and the mechanism used to associate mathematical meaning with symbols are
provided here. Appendix C MathML3 Content Dictionaries provides a complete listing of the specific Content
MathML symbols defined by this specification along with full reference information
including attributes, syntax, and examples. It also describes the intended semantics of
those symbols and suggests default renderings. The rules for using presentation markup
within content markup are explained in Section 5.4.2 Presentation Markup in Content Markup. An informal EBNF
grammar describing the syntax for content markup is provided in Appendix B Content Markup Validation Grammar.

4.2.2 Encoding OpenMath Objects

Strict Content MathML is designed to be and XML encoding of OpenMath Objects (see
[OpenMath2004]), which constitute the semantics of strict content MathML
expressions. The table below gives an element-by-element correspondence between the
OpenMath XML encoding of OpenMath objects and strict content MathML.

Note that with this correspondence, strict content MathML also gains the OpenMath
binary encoding as a space-efficient way of encoding content MathML expressions.

4.2.3 Numbers

Editorial note: MiKo

This section will be reworked for OpenMath compatibility. It currently reflects the
state of MathML2(2e).

The containers such as <cn>12345</cn> represent mathematical
numbers. For example, the number 12345 is encoded as

<cn>12345</cn>

. The attributes and PCDATA content
together provide the data necessary for an application to parse the number. For
example, a default base of 10 is assumed, but to communicate that the underlying data
was actually written in base 8, simply set the base attribute to 8 as in

<cn base="8">12345</cn>

while the complex number
3 + 4i can be encoded as

<cn
type="complex-cartesian">3<sep/>4</cn>

Such information makes it possible for
another application to easily parse this into the correct number.

The cn element is the MathML token element used to represent numbers. The
supported types of numbers include: "real", "integer",
"rational", "complex-cartesian", and
"complex-polar", with "real" being the default type. An
attribute base is used to help specify how the content is to be parsed. Its
value (any numeric string) indicates numerical base of the number.The default value is
"10"

The content itself is essentially PCDATA, separated by <sep/> when two parts are needed in order to fully describe a
number. For example, the real number 3 is constructed by <cn
type="real">3</cn>, while the rational number 3/4 is constructed as
<cn type="rational"> 3<sep/>4 </cn>. The detailed structure
and specifications are provided in Section 4.2.3 Numbers.

Note: Each data type implies that the data adheres to certain formatting conventions,
detailed below. If the data fails to conform to the expected format, an error is
generated. Details of the individual formats are:

real

A real number is presented in decimal notation. Decimal notation consists of an
optional sign ("+" or "-") followed by a string of
digits possibly separated into an integer and a fractional part by a
"decimal point". Some examples are 0.3, 1, and -31.56. If a different
base is specified, then the digits are interpreted as being digits
computed to that base.

e-notation

A real number may also be presented in scientific notation. Such numbers have
two parts (a mantissa and an exponent) separated by sep. The first part
is a real number, while the second part is an integer exponent indicating a power
of the base. For example, 12.3<sep/>5 represents 12.3 times
105. The default presentation of this example is 12.3e5.

integer

An integer is represented by an optional sign followed by a string of 1 or more
"digits". What a "digit" is depends on the
base attribute. If base is present, it specifies the base
for the digit encoding, and it specifies it base 10. Thus
base='16' specifies a hex encoding. When base >
10, letters are added in alphabetical order as digits. The legitimate values for
base are therefore between 2 and 36.

rational

A rational number is two integers separated by <sep/>. If
base is present, it specifies the base used for the digit encoding of
both integers.

complex-cartesian

A complex number is of the form two real point numbers separated by <sep/>.

complex-polar

A complex number is specified in the form of a magnitude and an angle (in
radians). The raw data is in the form of two real numbers separated by <sep/>.

MathML also allowed type "constant" with the Unicode symbols for
certain numeric constants. This only allowed in MathML3 as part of the pragmatic markup.

4.2.4 Symbols and Identifiers

The notion of constructing a general expression tree is essentially that of
applying an operator to sub-objects. For example, the sum
"x+y" can be thought of as an application of the
addition operator to two arguments x and y. And the expression
"cos(π)" as the application of the cosine function to the number
π.

In Content MathML, elements are used for operators and functions to capture the
crucial semantic distinction between the function itself and the expression
resulting from applying that function to zero or more arguments. This is addressed
by making the functions self-contained objects with their own properties and
providing an explicit apply construct corresponding to function
application. We will consider the apply construct in the next section.

In a sum expression "x+y" above, x
and y typically taken to be "variables", since they
have properties, but no fixed value, whereas the addition function is a
"constant" or "symbol" as it denotes a
specific function, which is defined somewhere externally. (Note that
"symbol" is used here in the abstract sense and has no connection with
any presentation of the construct on screen or paper).

Strict Content MathML3 uses the ci element (for "content
identifier") to construct a variable, or an identifier that is not a
symbol. Its PCDATA content is interpreted as a name that identifies it. Two
variables are considered equal, iff their names are in the respective scope (see
Section 4.2.6 Bindings and Bound Variables for a discussion).

Due to the nature of mathematics the meaning of the mathematical expressions must
be extensible. The key to extensibility is the ability of the user to define new
functions and other symbols to expand the terrain of mathematical discourse. The
csymbol element is used represent a "symbol" in much the same
way that ci is used to construct a variable. The difference in usage is
that csymbol is empty and should refer to some mathematically defined
concept with an external definition referenced via the csymbol attributes,
whereas ci is used for identifiers that are essentially
"local" to the MathML expression.

We need three bits of information to fully identify a symbol: a symbol
name, a Content Dictionary name, and (optionally) a
Content Dictionary base URI, which we encode in three attributes of the
csymbol element: name, cd, and cdbase.
The Content Dictionary is the location of the declaration of the symbol, consisting
of a name and, optionally, a unique prefix called a cdbase which is
used to disambiguate multiple Content Dictionaries of the same name. There are
multiple encodings for content dictionaries, this referencing scheme does not
distinguish between them. If a symbol does not have an explicit cdbase
attribute, then it inherits its cdbase from the first ancestor in the XML
tree with one, should such an element exist. In this document we have tended to
omit the cdbase for brevity.

There are other properties of the symbol that are not explicit in these fields
but whose values may be obtained by inspecting the Content Dictionary
specified. These include the symbol definition, formal properties and examples and,
optionally, a Role which is a restriction on where the symbol may
appear in a MathML expression tree. The possible roles are described in Section 4.5.4 Symbol Roles.

For backwards compatibility with MathML2 and to facilitate the use of MathML
within a URI-based framework (such as RDF [rdf] or OWL [owl]), the content of the name, cd, and
cdbase can be combined in the definitionURL attribute: we
provide the following scheme for constructing a canonical URI for an MathML
Symbol, which can be given in the definitionURL attribute.

URI = cdbase-value + '/' + cd-value + '#' + name-value

In the case of the Christoffel symbol above this would be the URL

http://www.example.com/VectorCalculus#Christoffel

For backwards compatibility with MathML2, we do not require that the
definitionURL point to a content dictionary. But if the URL in this
attribute is of the form above, it will be interpreted as the canonical URL of a
MathML3 symbol. So the representation above would be equivalent to the one below:

The URI encoding of the triplet we propose here does not work (not yet for
MathMLCDs and not at all for OpenMath2 CDs). The URI reference proposed uses a bare
name pointer #Christoffel at the end, which points to the element that
has and ID-type attribute with value Christoffel, which is
not present in either of these formats. Moreover, it does not scale well with
extended CD formats like the OMDoc 1.8 format currently under development

For the inheritance mechanism to be complete, it would make sense to define a
default cdbase attribute value, e.g. at the math element. We'd support
expressions ignorant of cdbase as they all are thus far. Something such as
http://www.w3.org/Math/CDs/official ? Moreover the MathML content
dictionaries should contain such.

Resolution

None recorded

4.2.5 Function Application

The most fundamental way of building a compound object in mathematics is by
applying a function or an operator to some arguments. MathML supplies an
infrastructure to represent this in expression trees, which we will present in this
section.

An apply element is used to build an expression tree that represents the
result of applying a function or operator to its arguments. The tree corresponds to
a complete mathematical expression. Roughly speaking, this means a piece of
mathematics that could be surrounded by parentheses or "logical
brackets" without changing its meaning.

The opening and closing tags of apply specify exactly the scope of any
operator or function. The most typical way of using apply is simple and
recursive. Symbolically, the content model can be described as:

<apply> opab </apply>

where the operandsa and b are MathML
expression trees themselves, and op is a MathML expression tree that
represents an operator or function. Note that apply constructs can be
nested to arbitrary depth.

There is no need to introduce parentheses or to resort to operator precedence in
order to parse the expression correctly. The apply tags provide the proper
grouping for the re-use of the expressions within other constructs. Any expression
enclosed by an apply element is viewed as a single coherent object.

Both the function and the arguments may be simple identifiers or more complicated
expressions.

The apply element is conceptually necessary in order to distinguish
between a function or operator, and an instance of its use. The expression
constructed by applying a function to 0 or more arguments is always an element from
the codomain of the function. Proper usage depends on the operator that is being
applied. For example, the plus operator may have zero or more arguments,
while the minus operator requires one or two arguments to be properly
formed.

If the object being applied as a function is not already one of the elements
known to be a function (such as sin or plus) then it is treated as
if it were a function.

4.2.6 Bindings and Bound Variables

Some complex mathematical objects are constructed by the use of bound
variables. For instance the integration variables in an integral expression is
one. Such expressions are represented as MathML expression trees using the
bind and bvar elements, possibly augmented by the qualifier
element condition (see .

The bvar element is a special qualifier element that is used to denote
the bound variable of a binding expression, e.g. in sums, products, and quantifiers
or user defined functions.

We need to say something about alpha-conversion here for OpenMath compatibility.

4.2.7 Qualifiers

The integrals we have seen so far have all been indefinite, i.e. the range of the
bound variables range is unspecified. In many situations, we also want to specify
range of bound variables, e.g. in definitive integrals. MathML3 provides the optional
condition element as a general restriction mechanism for binding expressions.

A condition element contains a single child that represents a truth
condition. Compound conditions are indicated by applying operators such as
and in the condition. Consider for instance the following representation of a
definite integral.

Here the condition element restricts the bound variables to range over the
non-negative integers. A number of common mathematical constructions involve such
restrictions, either implicit in conventional notation, such as a bound variable, or
thought of as part of the operator rather than an argument, as is the case with the
limits of a definite integral.

A typical use of the condition qualifier is to define sets by rule, rather
than enumeration. The following markup, for instance, encodes the set {x |
x < 1}:

In the context of quantifier operators, this corresponds to the "such
that" construct used in mathematical expressions. The next example encodes
"for all x in N there exist prime numbers p, q
such that p+q = 2x".

4.2.8 Structure Sharing

To conserve space, MathML3 expression trees can make use of structure sharing via the
share element. This element has an href attribute whose value is the
value of a URI referencing an id attribute of a MathML expression tree. When
building the MathML expression tree, the share element is replaced by a copy of
the MathML expression tree referenced by the href attribute. Note that this
copy is structurally equal, but not identical to the element referenced. The
values of the share will often be relative URI references, in which case they
are resolved using the base URI of the document containing the share element.

We say that an element dominates all its children and all elements they dominate. An
share element dominates its target, i.e. the element that carries the
id attribute pointed to by the xref attribute. For instance in the
representation above the apply element with id="t1" and also the
second share dominate the apply element with id="t11".

The occurrences of the share element must obey the following global
acyclicity constraint: An element may not dominate itself. For instance the
following representation violates this constraint:

Here, the apply element with id="foo" dominates its third child,
which dominates the share element, which dominates its target: the element with
id="foo". So by transitivity, this element dominates itself, and by the
acyclicity constraint, it is not an MathML expression tree. Even though it could be given
the interpretation of the continued fraction
this would correspond to an infinite tree of applications, which is not admitted by
Content MathML

Note that the acyclicity constraints is not restricted to such simple cases, as the following
example shows:

Here, the apply with id="bar" dominates its third child, the
share with xref="baz", which dominates its target apply
with id="baz", which in turn dominates its third child, the share
with xref="bar", this finally dominates its target, the original
apply element with id="bar". So this pair of representations
violates the acyclicity constraint.

Note that the share element is a syntactic referencing mechanism:
an share element stands for the exact element it points to. In particular,
referencing does not interact with binding in a semantically intuitive way, since it
allows for variable capture. Consider for instance

it represents the term which has two sub-terms of the form , one with id="orig"
(the one explicitly represented) and one with id="copy", represented by the
share element. In the original, the variable x is bound by the
outerbind element, and in the copy, the variable x is
bound by the innerbind element. We say that the inner bind
has captured the variable X.

It is well-known that variable capture does not conserve semantics. For instance, we
could use α-conversion to rename the inner occurrence of x into, say,
y arriving at the (same) object
Using references that
capture variables in this way can easily lead to representation errors, and is not
recommended.

4.2.9 Attribution via semantics

Content elements can be adorned with additional information via the
semantics element, see Section 5.3 Attributions in Strict Content MathML for details. As
such, the semantics element should be considered part of both presentation
MathML and content MathML. MathML3 considers a semantics element (strict)
content MathML, iff it's first child is (strict) content MathML. All MathML
processors should process the semantics element, even if they only process
one of those subsets.

4.2.10 In Situ Error Markup

Error is made up of a symbol and a sequence of zero or more MathML expression
trees. This object has no direct mathematical meaning. Errors occur as the result of
some treatment on an expression tree and are thus of real interest only when some sort
of communication is taking place. Errors may occur inside other objects and also inside
other errors. Error objects might consist only of a symbol as in the object:

To encode an error caused by a division by zero, we would employ a
aritherror Content Dictionary with a DivisionByZero symbol
with role error we would use the following expression tree:

4.4 Pragmatic Content MathML

MathML3 content markup differs from earlier versions of MathML in that it has been
regularized and based on the content dictionary model introduced by OpenMath [OpenMath2004].

MathML3 also supports MathML2 markup as a pragmatic representation that is easier to
read and more intuitive for humans. We will discuss this representation in the following
and indicate the equivalent strict representations. Thus the "pragmatic content
MathML" representations inherit the meaning from their strict counterparts.

4.4.1 Numbers with "constant" type

The cn element can be used with the value "constant" for
the type attribute and the Unicode symbols for the content. This use of
the cn is deprecated in favor of the number constants
exponentiale,
imaginaryi,
true,
false,
notanumber,
pi,
eulergamma, and
infinity
in the content dictionary constants CD, or the use of csymbol
with an appropriate value for the definitionURL. For example, instead of using the
pi element, an instance of <cn
type="constant">&pi;</cn> could be used.

4.4.2 Non-empty csymbol Elements

In pragmatic MathML3 the csymbol can be given content that caches the
presentation of the symbol. The content is either PCDATA, or a general
presentation-MathML layout tree. For example,

encodes an atomic symbol that displays visually as C2 and
that, for purposes of content, is treated as a single symbol representing the
space of twice-differentiable continuous functions. This pragmatic representation is equivalent to

encodes an atomic symbol that displays visually as c1 which, for
purposes of content, is treated as a atomic concept representing a real number.

Instances of the bound variables are normally recognized by comparing the XML
information sets of the relevant ci elements after first carrying out XML
space normalization. Such identification can be made explicit by placing an
id on the ci element in the bvar element and referring
to it using the definitionURL attribute on all other instances. An
example of this approach is
This id based approach is especially helpful when constructions involving
bound variables are nested.

It can be necessary to associate additional information with a bound variable one
or more instances of it. The information might be something like a detailed
mathematical type, an alternative presentation or encoding or a domain of
application. Such associations are accomplished in the standard way by replacing a
ci element (even inside the bvar element) by a semantics
element containing both it and the additional information. Recognition of and
instance of the bound variable is still based on the actual ci elements and
not the semantics elements or anything else they may contain. The
id based approach outlined above may still be used.

A ci element with Presentation MathML content is equivalent to a
semantics construction where the first child is a ci whose content is
the value of the definitionURL attribute and whose second child is an
annotation-xml element with the MathML Presentation. For example the Strict
Content MathML equivalent to the example above would be

4.4.4 Elementary MathML Types on Tokens

The ci element uses the type attribute to specify the basic type
of object that it represents. While any CDATA string is a valid type, the
predefined types include "integer", "rational",
"real", "complex", "complex-polar",
"complex-cartesian", "constant", "function"
and more generally, any of the names of the MathML container elements (e.g.
vector) or their type values. For a more advanced treatment of types, the
type attribute is inappropriate. Advanced types require significant
structure of their own (for example, vector(complex)) and are probably best constructed
as mathematical objects and then associated with a MathML expression through use of the
semantics element.

Editorial note: MiKo

Give the Strict equivalent here by techniques from the Types Note

4.4.5 Token Elements

For convenience and backwards compatibility MathML3 provides empty token elements
for the operators and functions of the K-14 fragment of mathematics. The general rule
is that for any symbol defined in the MathML3 content dictionaries (see Appendix C MathML3 Content Dictionaries), there is an empty content element with the same name. For instance, the
empty MathML element

In MathML2, the definitionURL attribute could be used to modify the
meaning of an element to allow essentially the same notation to be re-used for a
discussion taking place in a different mathematic domain. This use of the attribute is
deprecated in MathML3, in favor of using a
csymbol with the same definitionURL attribute.

4.4.6 Tokens with Attributes

In MathML2, the meaning of various token elements could be specialized via various
attributes, usually the type attribute. Strict Content MathML does not
have this possibility, therefore these attributes are either passed to the symbols as
extra arguments in the apply or bind elements, or MathML3 adds new
symbols for the non-default case to the respective content dictionaries.

We will summarize the cases in the following table:

pragmatic Content MathML

strict Content MathML

<diff type="function"/>

<csymbol name="diff" cd="calculus_veccalc"/>

<diff type="algebraic"/>

<csymbol name="aDiff" cd="calculus_veccalc"/>

Editorial note: MiKo

systematically consider all the cases here

4.4.7 Container Markup

To retain compatibility with MathML2, MathML3 provides an alternative
representation for applications of constructor elements. For instance for the
set element, the following two representations are considered equivalent

4.4.8 Domain of Application in Applications

The domainofapplication element was used in MathML2 an apply
element which denotes the domain over which a given function is being applied. In
contrast to its use as a qualifier
in the bind element, the usage in the apply element only marks the
argument position for the range argument of the definite integral.

MathML3 supports this representation as a pragmatic form. For instance, the
integral of a function f over an arbitrary domain C can be
represented as

in the Pragmatic Content MathML representation, it is considered equivalent to

<apply><int/><ci>C</ci><ci>f</ci></apply>

Editorial note: MiKo

be careful with Int and int here

4.4.9 Domain of Application in Bindings

The domainofapplication was intended to be an alternative to
specification of range of bound variables for condition. Generally, a domain
of application D can be specified by a condition element
requesting that the bound variable is a member of D. For instance, we consider
the Pragmatic Content MathML representation

4.4.10 Integrals with Calling patterns

MathML2 used the int element for the definite or indefinite integral of
a function or algebraic expression on some sort of domain of application. There are
several forms of calling sequences depending on the nature of the arguments, and
whether or not it is a definite integral. Those forms using interval,
condition, lowlimit, or uplimit, provide convenient
shorthand notations for an appropriate domainofapplication.

MathML separates the functionality of the int element into three
different symbols: int, defint, and defintset. The first two are integral operators
that can be applied to functions and the latter is binding operators for integrating
an algebraic expression with respect to a bound variable.

4.4.11 degree

The degree element is a qualifier used by some MathML containers to specify that,
for example, a bound variable is repeated several times.

Editorial note: MiKo

specify a complete list of containers that allow degree elements,
so far I see diff, partialdiff, root

The degree element is the container element for the "degree"
or "order" of an operation. There are a number of basic mathematical
constructs that come in families, such as derivatives and moments. Rather than
introduce special elements for each of these families, MathML uses a single general
construct, the degree element for this concept of "order".

A variable that is to be bound is placed in this container. In a derivative, it
indicates which variable with respect to which a function is being differentiated.
When the bvar element is used to qualify a derivative, the bvar
element may contain a child degree element that specifies the order of the
derivative with respect to that variable.

Note that the degree element is only allowed in the container representation. The strict representation takes
the degree as a regular argument as the second child of the apply or
bind element.

Editorial note: MiKo

Make sure that all MMLdefinitions of degree-carrying symbols get a
paragraph like the one for root.

The default rendering of the degree element and its contents depends on
the context. In the example above, the degree elements would be rendered as
the exponents in the differentiation symbols:

4.4.12 Upper and Lower Limits

The uplimit and lowlimit elements are Pragmatic Content MathML
qualifiers that can be used to restrict the range of a bound variable to an interval,
e.g. in some integrals and sums. uplimit/lowlimit pairs can be
expressed via the interval element from
the CD Basic Content
Elements. For instance, we consider the Pragmatic Content MathML representation

4.4.13 Lifted Associative Commutative Operators

MathML2 allowed the use of n-ary operators as binding operators
with bound variables induced by them. For instance union could be used as
the equivalent for the TeX \cup as well as \bigcup.
While the relation between the nary and the set-based operators is deterministic,
i.e. the induced big operators are fully determined by them, the concepts are
quite different in nature (different notational conventions, different types,
different occurrence schemata. I therefore propose to extend the MathML K-14 CDs
with symbols big operators, much like we already have sum as the big
operator for for the n-ary plus symbol, and prod for
times. For the new symbols, I propose the naming convention of
capitalizing the big operators (as an alternative, we could follow TeX and
pre-pend a bib). For example we could have Union as a big
operator for union

Resolution

None recorded

MathML2 allowed to use a associative operators to be "lifted" to
"big operators", for instance the n-ary union operator to
the union operator over sets, as the union of the U-complements over a
family F of sets in this construction

While the relation between the nary and the set-based operators is deterministic,
i.e. the induced big operators are fully determined by them, the concepts are quite
different in nature (different notational conventions, different types, different
occurrence schemata). Therefore the MathML3 content dictionaries provides explicit
symbols for the "big operators", much like MathML2 did with sum
as the big operator for for the n-ary plus symbol, and prod for
times. Concretely, these are
Union,
Intersect,
Max,
Min,
Gcd,
Lcm,
Or,
And, and
Xor. With these, we can express all
Pragmatic Content MathML expressions. For instance, the union above can be represented
strictly as

The large operators can be solved in two ways, in the way described here, by inventing
large operators (and David does not like symbol names distinguished only by case; and I
agree tend to agree with him). Or by extending the role of roles to allow duplicate
roles per symbol, then we could re-use the symbols like we did in MathML2, but then we
would have to extend OpenMath for that

Resolution

None recorded

4.4.14 Declare (declare)

The declare element is a construct with two primary roles. The first
is to change or set the default attribute values for a mathematical identifier. The
second is introduce a new identifier "name" for an object. Once a
declaration is in effect, the

<ci>name</ci>

acquires
the new attribute settings, and (if the second object is present) stands for the
object. The actual instances of a declared ci element are normally recognized
by comparing their content with that of the declared element. Equality of two elements
is determined by comparing the XML information set of the two expressions after XML
space normalization (see [XPath]).

All declare elements must occur at the beginning of a math element.
The scope of a declaration is "local" to the surrounding
math element. The scope attribute can only be assigned to
"local". It was intended to support future extensions, but MathML3
contains no provision for making document-wide declarations, so the scope remains
fixed to local

Occurrences of declare with only one argument can be eliminated by adding
the respective attributes to all other occurrences of the same identifier in the
respective math element. E.g.

Occurrences of the declare element with a second argument can be eliminated
with the help of the MathML share element. If the declared identifier (the first
child of the declare is not used in the expression, the declare element
can be dropped. If it is used once, it can simply be replaced with the second
declare child. If it is used two or more times, we replace one of its occurrences
with the second declare child, add a new id attribute, and replace all
other occurrences by share elements that point to this. For instance

4.5 MathML3 Content Dictionaries

The primary role of MathML content elements is to provide a mechanism for recording
that a particular notational structure has a particular mathematical meaning. To this
end, every content element must have a mathematical definition associated with it in
some form. These definitions are provided in the form of content
dictionaries, XML files of a certain structure (see ).

The concept of a content dictionary has initially been introduced by the OpenMath1
format [OpenMath2000], and has been stabilized and generalized to
abstract content dictionaries in the OpenMath2 standard [OpenMath2004], keeping a variant of OpenMath1 CDs as a reference
encoding.

MathML 3 introduces a content dictionary format that is designed to support the
MathML language, while meeting the requirements of OpenMath abstract CDs. We will
introduce the format in the rest of the section and give an overview over the MathML3
content dictionaries for the K-14 fragment of Mathematics which is part of the MathML3
recommendation.

Editorial note: Miko

reference the final resting place or
joint OM/W3C document here.

4.5.1 MathML3 Content Dictionaries

We will now detail the MathML3 Content Dictionary format, on an abstract level and
discuss the special case of the MathML Recommendation CDs. Note that the latter are
not the only possible ones, any individual or group can set up and publish CDs for the
purposes of communication.

MathML uses the namespace URI http://www.w3.org/ns/mathml-cd for the
XML encoding of MathML content dictionaries. In the examples below, we will use the
namespace prefix mcd for visual disambiguation and assume that it is
bound to the URI above by the context of the example.

Do we want a separate namespace for MathML CDs? David is suggesting that
http://www.w3.org/ns/mathml-cd could be obtained without director's
approval. In the following I am assuming that we will do this, even we do not
have a formal decision on this in the group.

Resolution

None recorded

A MathML Content Dictionary consists of a set of symbol
declarations (see Section 4.5.2 Symbol Declarations) together with
administrative information about them. A MathML content dictionary is represented by
the mcd:mcd element. The first child of the mcd:mcd element is a
mcd:description element that contains a description of the collection of
symbols defined by the CD. The content of the mcd:description is ????.

We need to fix a content model for text fields in MCDs. This should probably
be some fragment of XHTML+MathML, most probably the inline content model

Resolution

None recorded

Further administrative information about the CD as a whole is given by the
following required attributes of the mcd:mcd element.

The CD name is given in the id attribute.

The revision date, i.e. the date of the last change to the
Content Dictionary is specified in the revision-date. Dates in MathML
CDs are stored in the ISO-compliant format YYYY-MM-DD, e.g. 1966-02-03. For the
MathML specification CDs the revision date is the date of the publication of the
respective MathML recommendation.

The review date, i.e. a date until which the content dictionary
is guaranteed to remain unchanged is specified in the review-date attribute.

There is not really a sensible review date for MathML3 spec CDs. We have no
idea, when the spec will be revised. Unless we want to make them less normative
than the MathML3 spec, and set up a revision process, then we will have to make
this optional (and remove the requirement in OpenMath3)

Resolution

None recorded

The CD version number consists of a major and minor part, it is
specified in the version attribute. For the MathML specification CDs,
this is the version number of the respective MathML recommendation.

The status of the CD is given in the status
attribute. It's value is one of

"official": i.e. approved by W3C as part of the MathML
specification

experimental: under development and thus liable to change;

private: used by a private group of users;

obsolete: an obsolete Content Dictionary kept only for
archival purposes.

OpenMath2 standard only allows official status for CDs approved
by the OM Society. This seems overly proprietary. The MathML CDs should also be
official. Change in OpenMath 3?

Resolution

None recorded

The CD base is a URI which, when combined with the CD name,
forms a unique identifier for the Content Dictionary. It may or may not refer to
an actual location from which it can be retrieved. The CD base is specified by
the cdbase attribute.

4.5.2 Symbol Declarations

Editorial note: MiKo

the material here still contains material that is just copied form OM-style
abstract content, it will be reworked into a description of the MathML3 CD format
soon. dictionaries.

MathML Content Dictionaries use the mcd:MMLdefinition element for symbol
declarations. This element carries a mandatory name attribute that
specifies the name of the declared symbol. Its value is an XML1.1 name
[xml11]. The role of the symbol is specified in the
optional role attribute (see Section 4.5.4 Symbol Roles for details and
values. The syntactic and semantic properties symbol are given by the following
specialized elements in the body of the mcd:MMLdefinition element:

A short description of the symbol. It can be accompanied by a
discussion, which can be as formal or informal as the author
likes. These are given as description and discussion elements
whose content are ???.

Zero or more commented mathematical properties which are
mathematical properties of the symbol expressed in a human-readable way.

Zero or more properties which are mathematical properties of the
symbol. A property can be expressed in natural language and as a MathML
expression tree in the same property. The former is directly aimed at human
readers, and the latter could be used for validation or evaluation in
mathematical software systems.

property may be given an optional kind attribute. An author of
a Content Dictionary may use this to indicate whether, for example, the property
provides an algorithm for evaluation of the concept it is associated with. At
present no fixed scheme is mandated for how this information should be encoded
or used by an application.

Zero or more mathematical examples which are intended to
demonstrate the use of the symbol within a content MathML expression tree.

4.5.3 Type Declarations

Editorial note: MiKo

copy parts of the types note here, develop
signature declarations for all symbols in the CDs, and make mathmltypes
and STS CDs.

4.5.4 Symbol Roles

We say that a symbol is used to construct an MathML expression tree
if it is the first child of an apply, bind or error
element. The role of a symbol is a restriction on how it may be used to
construct a compound expression tree and, in the case of the key in an attribution
object, a clarification of how that attribution should be interpreted. The possible
roles are:

binder The symbol may appear as the first child of a
bind element.

application The symbol may appear as the first child of an
apply element.

constant The symbol cannot be used to construct a compound
expression tree.

Those are the roles in OpenMath. Do we need more in MathML? We could have one
for constructor (so that we know that it is a container element in legacy
markup)... That could later be mapped to application. But maybe this would be
better done by the classification element.

Resolution

None recorded

A symbol cannot have more than one role and cannot be used to construct a compound
expression tree object in a way which requires a different role (using the
definition of construct given earlier in this section). This means that one cannot
use a symbol which binds some variables to construct, say, an application object.
However it does not prevent the use of that symbol as an argument in an
application object (where by argument we mean a child with index greater than
1).

If no role is indicated then the symbol can be used anywhere. Note that this is
not the same as saying that the symbol's role is constant.

4.6 Rendering of Content Elements

While the primary role of the MathML content element set is to directly encode the
mathematical structure of expressions independent of the notation used to present the
objects, rendering issues cannot be ignored. Therefore it is important that content
MathML have a native infrastructure for specifying notations for content
symbols. These specifications can be used either directly as a parameter to a
generic rendering process that can thus adapt to extensible mathematical
vocabularies or to inform the implementation of specialized rendering
processes for restricted vocabularies.

As notations are tied to content symbols, content dictionaries seem like a natural
place: a generic rendering process can look up notation definitions in the CD specified
in the csymbol element; a specialized rendering procedure can delimit it's area
of applicability by the CDs it covers (e.e. the MathML3 CDs). As mathematical notation
is highly variable even for fundamental concepts, CDs can only contain (sets of) of
default notation specifications that can be overridden e.g. by user
preferences. The MathML3 specification does not specify any mechanism for building
generic or specialized rendering processors, or for selecting the relevant notations in
a given context. Instead we specify a function from content MathML expressions to
presentation MathML expressions that takes a list of notation specifications as an
input. We will call this function the MathML3 rendering function, even
though strictly speaking its values are presentation MathML expressions that have to be
rendered by a MathML-aware processor to be rendered for visual or aural consumption.

Note that the mechanism of notation specifications can directly be transferred for
generating renderings or representations in other target formats. In fact it can even be
reversed to be a source of information for interpretation procedures.

In the rest of the section we will specify the format of notation specifications and
then define the MathML3 rendering function based on this.

4.6.1 Notation Specifications

MathML specifies notations using a template-based mechanism. In essence, a notation
specification is a pair consisting of schematic content MathML expression (called the
prototype) together with a schematic presentation MathML expression (called
the rendering). Schematic MathML expressions are expressions that can contain
metavariables, and thus stand for a set of MathML expressions. A
prototype/rendering pair directly specifies a correspondence between a set of content
MathML expressions (those that match the prototype) and presentation MathML expressions
(the result of instantiating the metavariables in the rendering with the pMathML
expressions corresponding to the values of the matcher). Consider, for instance the
following prototype/rendering pair:

This specifies that any application expression whose first child is a
"minus" symbol will correspond to an mrow with an infix
"-" operator. The meta-variables in the prototype are represented by
expr elements and by mcd:render elements in the rendering, they correspond if
they share the value of the name attribute. Thus the content MathML expression

MathML3 uses four elements for representing metavariables, i.e. named variables that
stand for arbitrary MathML expression trees or tree lists: expr,
exprlist, mcd:render, and iterate. The first two are called
content metavariables and the latter two rendering
metavariables. The expr and mcd:render elements are called
element metavariables and stand for single MathML expressions, while the
exprlistiterate elements are called sequence metavariables
and stand for possibly empty sequences of MathML expressions. Content metavariables stand
for (lists of) cMathML expressions whereas rendering metavariables stand for (lists of)
pMathML expressions. Finally, we call a MathML expression schematic, if it
contains one or more metavariables.

All metavariables carry the required name attribute, whose value is an XML
name. The expr el is empty, and the exprlist can contain an arbitrary
schematic cMathML expression. The mcd:rendering element contains an arbitrary
sequence of schematic pMathML expressions. The iterate element carries the
optional reverse attribute, whose values can be "yes" and
"no". The body of the iterate element consists of a
separator element arbitrary sequence of schematic pMathML expressions. Finally,
the content of the separator element is a arbitrary sequence of schematic pMathML
expressions.

Prototypes and renderings are represented by the mcd:prototype and
mcd:rendering elements in notation specification. The former contains schematic
content MathML expression with input metavariables, and the latter contains a schematic
presentation MathML expression with rendering metavariables. A prototype may not contain
two sibling exprlist elements and no two content metavariables may have the same
name attribute. The mcd:prototype element carries the optional
priority attribute whose value is a natural number; if the attribute is
missing, the priority of the prototype defaults to 0. The mcd:rendering
element carries the optional xml:lang, context, and
variant attributes. The values of the xml:lang attribute are ISO 639
language specifiers. The values of the context and variant attribute
are currently not specified and left to applications.

Intuitively, a cMathML expression Ematches a prototype
P, iff there is a mapping σ from metavariables in P
to (lists of) cMathML expressions, such that if σ is applied to
P, then the result is E. The formal definition of matching schematic
content MathML expressions below is somewhat more complicated, since we need to take
sequence metavariables into account, and the semantics elements have a built-in
notion of flattening. For given E and P, we know that if
E matches P, then σ is unique; it is called the
matcher for E and P.

Let σ be a mapping from metavariable names to lists of MathML
expressions. We say that a sequence
E of cMathML expressions matches a sequence of prototypes
P via σ, iff one of the following holds

P is a single expr metavariable with name n and
σ(n)=E

P and E are flattened semantics elements,
P' and E', their first children and the Q=Q(1)
... Q(n) and C=C(1)...C(n+k) are the sequences of those
annotation-xml children of P and E whose keys have the
role "semantic-annotation", sorted lexicographically via their
cdbasecd, and name attributes. Then E
matches P via σ, iff Q(i) match
C(i+k) for 1⋜i⋜n, and
<semantics>E' C(1) ... C(k-1)</semantics>
matches P'σ.

P and E are single elements that have the same name, and the
attributes of P is a subset of those of E and their values
coincide, and the sequence of children of E matches that of P
via σ.

P and E are sequences of the form P(1) ... P(n) and
E(1) ... E(m) and

none of the Pi is a sequence metavariable and n=m and

all the E(i) match P(i) via σ.

P and E are sequences of the form P(1) ... P(n) and
E(1) ... E(m) and

P(j) is a exprlist element with name ν for some
1⋜j⋜n (thus none of the P(i) are by the
no-sibling-constraint above).

E(i) match P(i) via σ for i<j.

E(n-i) match P(m-i) via σ for
1⋜i<n-j.

σ(ν)=E(j) ... E(m-j)

P(j) has children C=C(1) ... C(l) and m-n=kl

σ(ν) matches the sequence given by the k-fold
concatenation of C with itself via σ.

For given prototype P and expression E, we know that there is at
most one mapping σ such that E matches
P via σ.

A notation element usually occurs as part of a MMLdefinition element
for its top symbol (i.e. the highest token or csymbol in the pattern expression
trees). If it appears in some other context (e.g. as part of a user's notation preferences
file), then it should reference the top symbol via the cd, name, and
possibly cdbase attributes to ensure that it can be found by the rendering
process.

For representation conciseness and manageability, MathML3 groups prototypes and
renderings in notation elements. We call a list of mcd:rendering elements in
a notation element a rendering block, iff it is delimited by
mcd:prototype elements or the end tag of the notation element itself. No two
mcd:rendering elements in a rendering block may have the same values for all three of
the xml:lang, context, and variant attributes. We say
that a cMathML expression selects the rendering block following the prototype
with the highest priority among those that it matches.

Multiple mcd:prototype elements can be used e.g. to make a notation specification
applicable to both strict and pragmatic cMathML expressions e.g. by adding a prototype
involving the element <plus/> element. Multiple mcd:rendering
elements in a rendering block can be used to model different notations or language
conventions; they are differentiated by their xml:lang, context, and
variant attributes. The xml:lang and context attributes
allows to select a rendering variant via a global context. The former by language context
to enable multilingual rendering. The selection mechanism of the context
attribute is currently unspecified and left to applications. The variant
attribute allows to select a notation variant locally on a cMathML element via its
variant attribute. If a content element carries a class attribute
and matches one of the prototypes of a notation element, then the mcd:rendering
element whose variant attribute value has the same value will be chosen for
rendering. (see Section 4.6.3 General Rules)

The intended meaning of a notation element is that a cMathML expression
corresponds to that rendering in the selected rendering block that is suitable in the
current context.

We could have a possibility to re-use the selector language
of CSS 2.1, which is big but well accepted,
or we could have a possibility to restrict
to style class-names (as described above). Both would offer
some predictability for authors.

Resolution

None recorded

A simple example is provided below, it describes the notations for the open real
interval in English and other languages:

matches the first prototype in the notation specification above. The matcher
σ maps <arg name="a"/> to
<apply><minus/><ci>n</ci><ci>ε</ci></apply> and <arg
name="b"/> to
<apply><minus/><ci>n</ci><ci>ε</ci></apply>. In a French language
environment, the second mcd:rendering is selected, since the English notation in
the first target does not apply. Therefore the cMathML expression renders to

Multiple notation specifications per symbol are explicitly allowed, they can be used
e.g. when writing a notation specification for the derivative which would be presented
differently if applied on a simple function or a function defined using the lambda binding
which indicates explicitly the variable of derivation. Thus the following notation
specification with two rendering blocks is used:

Many operations take an arbitrary number of arguments. These are modeled using
schematic MathML expressions containing the exprlist metavariable, which matches a
list of cMathML expressions in a prototype. In a mcd:rendering element, an
exprlist element can contain arbitrary pMathML content. Consider for instance the
notation specification for addition:

then the metavariable summands matches the list
<ci>a</ci><ci>b</ci><ci>c</ci>. In this situation, rendering of
the corresponding exprlist is the list of rendered elements interleaved by the
content of the exprlist. In our case we obtain

<mi>a</mi><mo>+</mo><mi>b</mi><mo>+</mo><mi>c</mi>

Editorial note: MiKo

The following examples might be material for the primer though

To further fortify our intuition, let us consider a complex example. We want to render
a content representation of a multiple integral expression e.g.:

Note that the prototype picks up the structural backbone of the integral expression in the
mcd:prototype element, and introduces metavariables for the variable parts of
integral expressions. While matching the example, the meta variable sets is
bound to the list P, Q, R of sets that are integrated
over. This list is used twice in the rendering: once for generating the three integral
glyphs in the integration operator (here we do not have a mcd:render element so the
sets themselves are not rendered) and once in the cartesian product under the integration
operator. The list of bound variables is picked up in the end of the rendering to generate
the dx dy dz postfix.

Note that the matcher here consists of two mappings, one binds the top-level metavariables
domain, variables, condition, and
body, and the second one is indexed by the sequence variable
domain and binds the metavariables lower_bound and
upper_bound in its scope. The latter two can only be used in the rendering
inside the iterate element that renders domains.

4.6.2 Precedence-based Elisions

In content MathML expressions, the function/argument relation is fully specified by the
prefix notation in expression trees. In traditional mathematical notation, operator
placement is much less restricted and brackets are used sparingly to disambiguate the
functional structure. However, the use of brackets is usually restricted to cases, where
the functional structure cannot be derived from the two-dimensional structure of the
notations and conventions about binding strength of operators. This distribution of
brackets has to be re-created to obtain high-quality renderings for cMathML expressions,
and therefore the binding strengths have to be modeled in notation specifications.

The precedence currently used here may clash with the precedence rules explained
within the mo description in presentation-mathml We should clear things: if it is
compatible, if it is the same, if it extends it. Also, using numbers might be a bad
idea. Why?

Resolution

None recorded

To account for this MathML3 add an precedence attributes to the
mcd:rendering element (to specify the the operator precedence) and the
mcd:render and iterate elements (to specify the the argument
precedence). The value of this attribute is an integer or +∞ or
-∞. If the precedence attribute is not present on a
mcd:rendering element, its operator precedence the default value 0. If the
precedence attribute is not present on a mcd:render or iterate
element the respective argument precedence is the same as the operator precedence.

The operator precedence allows rendering agents to decide on fencing the constructed
rendering. Intuitively operators with larger precedence bind more strongly, so need not be
fenced. Correspondingly, the rendering of an expression is enclosed in fences, iff the
operator precedence is greater than the current precedence. For the next level of
rendering this is set to the respective argument precedence of the rendering metavariable
that triggers the recursive rendering. For top-level formulae renderings the current
precedence has the default value 0. This ensures that outer brackets are usually elided,
since most operators have positive operator precedence.

In MathML, fences are considered as a special case of semantic components that are
subject to elision, i.e. components that can be left out of the presentation
in certain situations. Even though brackets are the prime examples other components of
mathematical formulae, e.g. bases of logarithms, are also commonly elided, if they can be
derived from the context with little effort. In MathML, a component of a rendering is
marked to be elidable by adding the mcd:egroup attribute. This attribute
specifies the elision group. MathML reserves the value "fence"
for fences. In a situation, where the current precedence is c and the operator
precedence of the fence (i.e. the value of the precedence attribute on the
notation element that contains it) is o, then the fence is given a
visibility level of o-c (we take ∞-∞ to
be ∞ to make fences visible for safety in this degenerate
case). Rendering applications can specify other elision groups and give elidable
components an explicit visibility level as integers using the mcd:elevel
attribute. If none is given it defaults to the value 0.

Elision can take various forms in print and digital media. In static media like
traditional print on paper or the PostScript format, we have to fix the elision level, and
can decide at presentation time which elidable tokens will be printed and which will
not. In this case, the presentation algorithm will take visibility thresholds
T(g) for every elision group g as a user parameter and then
elide (i.e. not render) all tokens in visibility group g with level
l>T(g).

In an output format that is capable of interactively changing its appearance, e.g. dynamic
XHTML+MathML (i.e. XHTML with embedded Presentation MathML formulae, which can be
manipulated via JavaScript in browsers), an application can export the the information
about elision groups and levels to the target format, and can then dynamically change the
visibility thresholds by user interaction.

For example, the following notation element could be used for the factorial
operator:

assuming an operator precedence of 100 for the addition operator (the argument precedences
do not matter here, since identifiers are never fenced): 500-800=-300<0, so
the outer fences are elided, and 500-100=400>0, so the factorial fences are
rendered. In a situation with a smaller current precedence e.g. 400, the outer fences
would be rendered as well.

A dynamic renderer would pass the computed visibility levels to the rendering. In our
situation with a current precedence of 800 we would obtain

Thus for a fencing threshold of 0 we would get the same result. With a user-given
threshold larger than 400 (only components of a high visibility level are not elided) all
fences would be elided, and with a threshold smaller than -300, all brackets would be made
visible.

The requirement to have operator and argument precedences is probably most clearly seen
in the case of binary associative operators we would use the following notation specification:

Left- or right-associative binary operators (i.e. operators like the function space
constructor that only elide parentheses in one of their arguments) are simply constructed
by decreasing only one of their argument precedences.

4.6.3 General Rules

In this section we will specify the default rendering of cMathML expressions that do
not match any of the notation definitions given to the rendering function. The default
renderings of pragmatic content MathML are given by the default rendering of the
corresponding strict ones. We will go over the cases for strict MathML expressions in
turn:

4.6.3.1 Numbers

The default rendering of a simple cn-tagged object is the same as for
the presentation element mn with some provision for overriding the
presentation of the PCDATA by providing explicit mn tags. This is
described in detail in Section 4.2.3 Numbers.

4.6.3.2 Symbols and Identifiers

If the content of a ci or csymbol element is tagged using
presentation tags, that presentation is used. If no such tagging is supplied then
the PCDATA content is rendered as if it were the content
of an mi element. In particular if an application supports bidirectional
text rendering, then the rendering follows the Unicode bidirectional rendering.

4.6.3.3 Applications

If F is the rendering of f and Ai those of ai,
then the default rendering of an application element of the form

If the condition is not present, then the fragment
<mo separator="true">:</mo>C
is omitted from the rendering.

4.6.3.5 Attributions

The default rendering of a semantics element is the default rendering
of its first child: the annotation and annotation-xml are not
rendered. When a MathML-presentation annotation is provided, a MathML renderer may
optionally use this information to render the MathML construct. This would
typically be the case when the first child is a MathML content construct and the
annotation is provided to give a preferred rendering differing from the default
for the content elements.

Editorial note: MiKo

do all the rest

4.6.4 Attributes Modifying Content Markup Rendering

All content elements support the general attributes classstyle, id, and otherthat can be used to modify the
rendering of the markup. the first three are intended for compatibility with Cascading
Style Sheets (CSS), as described in Section 2.1.4 Attributes Shared by all MathML Elements.

MathML elements accept an attribute other (see Section 2.3.3 Attributes for unspecified data), which can be used to specify things not specifically
documented in MathML. On content tags, this attribute can be used by an author to
express a preference between equivalent forms for a particular content
element construct, where the selection of the presentation has nothing to do with the
semantics. Examples might be

inline or displayed equations

script-style fractions

use of x with a dot for a derivative over dx/dt

Thus, if a particular renderer recognized a display attribute to select between
script-style and display-style fractions, an author might write

The information provided in the other attribute is intended for use by
specific renderers or processors, and therefore, the permitted values are determined
by the renderer being used. It is legal for a renderer to ignore this
information. This might be intentional, as in the case of a publisher imposing a house
style, or simply because the renderer does not understand them, or is unable to carry
them out.

4.6.5 Limitations and Extensions of Notation Documents

The elements proposed in this section provide a basis for exchangeable
notation-documents which can be processed by rendering agents for the conversion of
content-elements, to presentation MathML.

There is a great wealth of conversion tools from content to presentation. Compared to
hand-written XSLT stylesheets, such as ctop.xsl (TODO: quote), the expression matching is
quite poor and the programming facilities are almost nonexistent (thus it seems not
possible, yet, to specify that the 7 in is computed
automatically), the approach of notation-documents is more declarative and opens the door
to exchangeability, moreover, it has the potential to respect user and language-dependent
notations.

For the many more dynamic rendering agents, which includes all content-oriented
input-editors, notation-documents may be a good way to render dynamically symbols
just found on the web.