I'm a software development engineer in Microsoft Office and have been working mostly on the RichEdit editor since 1994. In this blog I focus on mathematics in Office along with some posts on RichEdit and the early Windows days

The Math Paragraph

The Math Paragraph

The earlier post Breaking Equations into Multiple Lines describes equation line breaking and alignment. In particular, long equations often do not fit on a single line and need to be broken up for display on multiple lines. Word 2007 offers two approaches: automatic and manual line breaking. A related feature is alignment of multiple equations, such as aligning the equal signs in a group of equations, which is described in that post and in More on Math Context Menus.

The present post describes how these features are implemented in Microsoft Office using the mathematical paragraph, or “math para” for short, and mentions support for some additional properties supported by the underlying layout component (PTS) that aren’t yet implemented in Word or RichEdit, such as elegant equation numbering and customized vertical spacing. The post concludes with some observations about representing the math para in OMML, RTF and MathML. At some time I’ll post more about PTS, Page/Table Services, which is the intimate companion of LineServices, the Microsoft line layout component.

What’s a math para?

A math para is a group of one or more “equations” attached to a line in a text paragraph. The text paragraph is the ordinary paragraph discussed in the blog post Paragraphs and Paragraph Formatting (it may be helpful to read that post first). If a math para ends the text paragraph, the math para is terminated by the CR (inserted by typing Enter) that terminates the text paragraph. In all other cases, the math para ends with a VT (inserted by typing Shift+Enter) and is followed by a nonempty text line to which it is attached. If the math para contains more than one “equation”, they are separated by VT’s, and each equation consists of one or more math lines. Here “equation” is quoted because the entity involved is actually a display math zone which might only be a mathematical expression instead of a whole equation. But display math zones usually are equations and hence the name. It’s handy also to refer to text ending with a VT as a “soft paragraph”, as distinguished from text ending with a CR, which is a “hard paragraph”. This terminology is explained further in Paragraphs and Paragraph Formatting.

Okay, what’s a display math zone? First note that a math zone is a text range within which math typography rules usually apply and outside of which math typography rules do not apply. The caveat “usually” appears because math zones can contain specially marked normal text runs for which math typography rules don’t apply. Such text is handy for making equations like rate = distance/time, in which text occurs that should be displayed in normal upright style rather that in the math italic style used for most mathematical variables.

Math zones can be inline or display, corresponding to TeX’s $ and $$ toggle keys, respectively. If a math zone fills an entire soft or hard paragraph, it is a display math zone, i.e., it is displayed on its own line(s). If a math zone is preceded and/or followed by nonmath text other than a CR or VT, the math zone is inline and is rendered in a more compact fashion. Inline math zones usually consist of math expressions or variables, whereas display math zones usually consist of equations or formulas. Inside Microsoft Office, math zones are identified internally by a character-format effect bit like bold. Hence if you delete the ordinary text separating two math zones, you get a single merged math zone. (With hindsight, less overall code would have been required if we had delimited the math zones with special characters, rather than using a character-format effect, but that’s another story).

Note that a text paragraph can contain more than one math para as is often the case in technical documents and books.

Math Para Properties

In addition to the underlying plain-text structure of math paras described in the preceding section, there are math para properties representing equation alignment, manual breakpoints, and various horizontal and vertical parameters. The ways you can specify equation alignment and manual breaks are discussed in Breaking Equations into Multiple Lines and More on Math Context Menus. An alignment point is represented internally (and in OMML/RTF) by an operator character format property, not by a character of its own. The operator character so used for alignment can be any kind of operator and can be nested arbitrarily deeply inside a mathematical expression.

An equation break also occurs on an operator character and is represented by a character format property of that operator. But the operators that can be used for equation breaks are restricted in both type and context. First, equation breaks occur only on relational or binary operators. Second, context restricts the break possibilities further. For example, there’s no breaking inside a subscript or superscript object, unless the script or base being broken is bracketed. The reason for this restriction is for readability. The software could break a math expression most anywhere, but for readability, it’s important to know the scope of the expression being broken. Parentheses or brackets reveal such scope, as does the math zone itself if the operator isn’t contained within any math object, like a fraction or superscript object.

The manual break property also specifies to which operator on the first line of the broken equation the break should be aligned. As discussed in Breaking Equations into Multiple Lines, this choice can be made using the Tab key.

A math paragraph has its own alignment and indents, independent from the parent text paragraph. The blog post Default Document Math Properties includes discussion of the default math para alignment and various horizontal and vertical spacing parameters. The math para alignment can be one of: Center as a Group, Left, Right, Center. This kind of alignment is different from aligning a set of equations at various operators. It affects the whole math para as a block. In principle the alignment and positioning parameters could be specified on individual math paras as well, but Word 2007 only implements the math para alignment for individual math paras. The various positioning parameters are math para space before/after, inter/intra equation spacing, left/right indent, and wrap indent. Wrapped lines of an equation can either go to the wrapped indent or be aligned left/right/center. Word 2007 implements the indents on a document default basis, but doesn’t implement the vertical spacings.

Another important feature of the math para is optional equation numbers for one or more equations within the math para. The equation numbers are specified by soft paragraphs of their own and can be placed to the left or right of the equation with a variety of vertical alignment options. Hopefully someday Office will take advantage of this important functionality.

OMML, RTF, and MathML Representations of Math Paragraphs

The OMML tag <oMathPara> contains a math para and similarly the RTF group {\*\moMathPara..} contains a math para, which can contain multiple equations consisting of display math zones with the various alignment and breaking options described above. MathML doesn’t have the concept of a math paragraph. It could be modeled using MathML’s <mtable> entity, but it’s simpler to represent it as a sequence of one or more <math display="block"> entities separated by soft paragraph marks and terminated by a soft or hard paragraph mark. These marks are part of the parent document format into which the MathML is embedded. The equation alignment and breaking properties can be represented in MathML 3.0 using the ID attribute. The math para properties can be specified in the parent document format.