EPUB Math: Best Practices for Mathematics in Ebooks

5/28/2009 First draft

Science, technical, engineering and medical (STEM) content
presents unique challenges to publishers due to the fact that it contains a large
amount of mathematical notation, tables and diagrams. The
Open
Publication Structure v2.0 Specification, the standard for content
in EPUB ebooks, handles tables and
diagrams by relying on the table and image capabilities of XHTML. However, OPS
provides several ways in which equations might be addressed in EPUB
ebooks. The result has been that content providers and reader system vendors alike
have been unsure as to what to produce and implement. In addition, publishers
are interested in providing enhanced access to such content, including
accessibility to readers with disabilities, copying of equations for use in
analysis software, and making the mathematics visible to search. The purpose of this
document is to propose a single set of conventions for incorporating
mathematical content in EPUB ebooks that
achieves these objectives.

Goals

OPS offers basically two ways in which equations might be published in EPUB ebooks:
raster images (PNG, GIF) or vector images (SVG). Each has its weaknesses. Raster
images do not scale smoothly with the text. Neither raster or vector images
represent the equation's mathematical structure, preventing useful reading
system features such as accessiblity for readers with disabilities, copy to
calculation and authoring applications, or math-based search. MathML is the
standard XML format for representing mathematical notation and, therefore,
associating MathML with the equation's visual description is an essential goal.
In addition, there are other requirement for high-quality mathematics in ebooks,
such as baseline alignment. This
section identifies the goals for the ideal support for mathematical content in EPUB
ebooks, so that there is a basis for assessing the the proposed format.

Math notation must render with typographic quality comparable to
surrounding text. In particular:

Equations should scale together with the surrounding text.

Inline expressions should properly align on the baseline of
the surrounding text. Here is an illustration:

It must be possible to make math notation accessible in reading
systems that wish to do so.

Readers should make MathML available to assistive technology
(AT) applications.

Math notation should degrade gracefully in reading systems with
limited capabilities, lacking SVG
support for example.

It must be possible to implement enhanced functionality, such as
math-aware search and clipboard copy
of mathematical structure.

Implementation must be feasible for reading systems (both
technically and economically)

it should be straightforward to implement minimum
requirements

libraries should be readily available through licensing and
open source

there should be sufficient conformant content to make the
effort worthwhile

Implementation must be feasible for content producers (both
technically and economically)

it should be possible to generate with available workflow and
production software

Implementation
must
adhere to the OPS 2.0 specification.

Format for Mathematical Notation

Content should be encoded using the XHTML preferred
vocabularies. Equations should adhere to the following format:

The <img> provides an acceptable fallback rendering, and is well-supported in current reading
systems.

The <ops:switch> element provides for graceful fallback functionality
across a range of reading systems.

The main considerations for implementation in reading systems are:

The <ops:switch> functionality is required by OPS 2.0, and it is simple
to implement minimally compliant functionality.

Support for SVG is also required functionality for reading systems, and
both open source and commercial implementations are readily available to
reading systems.

Note that the positioning attributes are given in font-relative units (em),
and must be implemented that way for scaling and baseline alignment in
order to achieve high-quality display.

Implementing advanced functionality using the MathML encoding is not
required. However, MathML support is widespread in math-aware
software, and implementations for accessibility and other advanced
functionality is readily available to reading systems through open source
and commercial implementations. Consequently, reading systems should make <ops:switch>
data (particularly MathML) programmatically available to 3rd-party software
when possible.

Support for embedded fonts for use in SVG (or MathML)
implementations is also a long term requirement, since fonts are a
critical issue for math typography.

The format is also possible to generate using industry standard tools:

MathML is widely supported in publishing workflow software.

Many tools can produce high-quality fallback images from MathML or other
equation source

Support for SVG generation is increasing, with several converters
from MathML to SVG and other equation encodings available today.

Reading System Compliance

The following table may be used as a checklist to assess reading system
compliance with this specification. Note that the items in the first section are
requirements that would be met simply by fully implementing the EPUB standard.
They are included here because they are important for math and, as of this
writing, most EPUB reading systems do not fully implement the standard. In
particular, they lack support for these important items. The items in the second
section are allowed by the EPUB standard but required for good math support.

Pass/Fail

Technical Requirement

Reader Experience

EPUB Requirements:

support for inline SVG

text in math renders with same quality as body text

<ops:switch> properly implemented

only one math rendering appears when document includes
fallback images