At the Summer 2011 meeting,
there were several problems with the readability of various presentations.
Readability is another side of accessibility,
the ability of a wide variety of readers
to read the page under a wide variety of conditions.

This paper provides guidelines and tools
for producing widely accessible WG21 papers.
It is based on my experience in dealing with inaccessible web pages
and my experience writing accessible web pages.

These guidelines are for the production of WG21 papers.
While many of the concepts and techniques carry over into other uses,
they are incomplete with respect to those other uses.

Contrast was the primary problem at the summer meeting.
When contrast is low, readability is poor.
Further, low contrast exaggerates focus problems.

Reliance on color is a significant problem.
First, close to 10% of men are color deficient,
which means they cannot see colors normally.
There are several kinds of deficiency,
but by far the most common is an inability to distinguish red and green.
Second, many browsers support a "high contrast" mode,
which generally ignores page-specified colors.
Third, to save costs, WG21 papers are often printed without color.
The net effect is that color differences, and particularly red versus green,
is not sufficient to convey information.

Reliance on font face is a significant problem.
The "high contrast" mode generally ignores page-specified fonts,
So font differences are also not sufficient to convey information.

Reliance on long lines is a significant problem.
Low-vision readers rely on being able to increase font size to read the text.
Larger fonts mean relatively shorter lines.
Smaller windows also achieve the effect of forcing shorter lines.
Pages need to adapt to those shorter lines.

Reliance on external tools is not presently a problem,
but could become one.
Browsers behave differently.
They are configured in various ways.
They have different sets of extensions and plug-ins.
All of this variety leads to problems when straying from plain HTML.

The primary approach to solving these problems
is to rely on text to convey information,
and secondarily, to enable that text to adapt to the reader's needs.
One can decorate with style and color;
make it easier to read with style and color;
but one must make the text itself convey all needed information.

A consequence of a reliance on text is that
pages should avoid technologies that
displace or obscure text.
Examples include embedding text in images and using Flash.

Text more effectively adapts to readers' needs
when the semantic structure of the paper
is separated from the presentational choices.
In other words, the HTML elements should carry the paper's meaning,
and separate CSS should specify presentation.
Readers can alter the applied CSS,
but altering the elements is much harder.

Plain HTML is the most accessible and most reliable way to convey information.
So, we should encode documents with HTML elements
that best represent semantics.

Reading is more comfortable when the author
respects and accepts the users' choices
in browser, settings, colors and sizes.

Finally, papers should avoid reliance on problematic technologies,
like Javascript, Java, and video.

Use clean, well-structured HTML.
Doing so reduces document construction and maintenance costs,
as well as making documents easier to read.

Where possible, comply with all relevant standards.
We cannot control where our documents go,
so we should help them travel easily.

Avoid machine generation of HTML,
as the results tend to work towards a particular paper-based layout
rather than provide general readability.
In particular, word processors, such as Microsoft Word,
produce really bad HTML.

Never put style information within the body of the document.
[HTMLstyleinline]
Instead, uses the class attribute
to give an element additional semantic information,
which can then be decorated with CSS specified in the document head.

Browsers may ignore font specifications,
but they generally do not ignore the phrase markup elements
[HTMLphrase]em, strong, dfn,
code, samp, kbd,
var, cite,
and abbr.
So, one should use one of them rather than the font style elements
[HTMLfontstyle]tt, i, b,
big, small, strike,
s, and u.
One should certainly not use
the font element.
[HTMLfontstyle]

Normal emphasis should use the em element,
rather than the i element.

Strong emphasis should use the strong element,
rather than the b element.

The definition of a term should use the dfn element,
rather than the i element.

Citations should use the cite element.

Abbreviations should use the abbr element.
This element requires extra work to be effective,
and so may not find much application within WG21 papers.

Text that is variable, that is intended for substitution,
should use var element,
rather than the i element.
Grammar symbols fall directly into this category.

C++ identifiers, keywords, punctuation, and the like should use
one of the code element.
Sample output should use the samp element.
User input should use the kbd element.
Such text that is variable, that is intended for substitution,
should also use the var element.

Within C++ code, the characters
&, <, and >
must be quoted.
The script quote_code.sh
will convert C++ code into properly quoted HTML.

Browsers often use the same representation for more than one phrase element.
Commonly, the common representations are in the following table.

Common Phrase Representations

representation

tags

normal

abbr acronym

italic

cite dfn em i var

bold

b strong

fixed-width

code kbd samp tt

Authors should excercise care to ensure that
these overloaded elements are used
in contexts where the intent is reasonably clear.
Fortunately, some of these can be mixed,
such as var with code.

There are two types of quotes: block quotes and inline quotes.
[HTMLquote]

Block quotes use the blockquote element
and denote paragraph-level quotations.
As such they always have a block-level element within them,
such as an explicit p element.

Inline quotes use the q element,
and generally enclose short quotations.
Use inline quotes in place of quotation marks.
Some browsers fail to add the quotation marks as specified,
so this element may require some more time before it is reliable.

Use the pre element to enclose preformatted text.
[HTMLpre]
The pre element has definitional problems.
In particular, the browser may or may not change to a fixed-width font,
which means the author can neither avoid nor rely on a fixed-width font.
Therefore, authors should always specify a fixed-width font
immediately within the preformatted text
and ensure that it is active throughout the block.
That is,

<pre>
line of wait for it text
followed by some indentation
</pre>

is not reliable.
Instead, authors should specify

<pre>
<code>line of ... wait for it ... code
some of which is indented</code>
</pre>

Furthermore,
while

<pre><code>
line of ... wait for it ... code
some of which is indented
</code></pre>

is cleanest,
some browsers incorrectly
[HTMLline]
add an extra blank line
at the beginning of the preformatted text.

In any event, preformatted text does not wrap lines,
which makes them very difficult to read when the line width
is greater than the window width.
(This problem happens when either characters are large or windows are narrow.)
Therefore, authors should strive to keep preformatted lines short.

A significant part of WG21 documents are examples.
Represent examples with class=example
applied to p paragraphs,
pre preformatted text or
div document divisions.
Divisions contain any number of paragraphs.

Tables
[HTMLtable]
may consist of a caption (caption),
a head (thead),
a body (tbody), and
a foot (tfoot).
The last three elements contain rows.
The head and foot elements
enable browsers to duplicate headings and footings
when splitting a table across multiple pages.

HTML provides direct representation of deleted and inserted text.
[HTMLdelins]
These should be used in preference to ad hoc mechanisms.

The HTML standard intended these elements
for showing modifications to the document itself.
However, that is rarely a problem with WG21 papers.
Instead they need to show edits to the working draft,
and this repurposing of the elements is reasonable.

When the deletion occurs before an insertion,
readers can use the deletion to set the context for the insertion.
So, when paired, the deletion should come before its insertion.

Unless spacing is critical to the changes,
deletions and insertions should be spaced.
However, in the presence of changing punctuation,
non-spacing markup is preferable to excessive markup,
particularly when readers may not notice it.
For example,

The del and ins elements
are supposed to act as either block-level or inline-level elements,
however some browsers fail to render them properly as block-level elements.
Therefore, authors should use these elements as inline elements only.
(This workaround is most annoying for tables and lists.)

When headings follow a simple format,
they can be easily and automatically converted into a table of contents.
The format consists of a single line
containing a heading element
and directly within that an anchor element.
The anchor provides not a reference, but a name.
That name must be unique within the file.
Using the standard's own tagging system
is often unique, but not always.

The script contents.sh
generate a table of contents.
The resulting file can be simply included into the HTML source.

Alternatively, one can use Javascript within the HTML itself
to dynamically generate the contents.
The script dynacontents.js,
courtesy of Jeffrey Yasskin,
does this task.
It assumes that the user has previously included

It also assumes an p element, somewhere in the document,
with the id "toc", which it will fill in with the table of contents.

While the table of contents serves as an outline,
a more specific command-line tool that emits the outline
can be helpful during development.
There are two such scripts,
one that emits just the headings
and one that also emits the anchor names.
They are available as
outline.sh and
outline_with_names.sh,
respectively.

Conventions on the use of HTML
in representing concepts of the C++ standard
will help in cooperative editing,
sharing of helpful tools,
and automatic translation into the LaTeX source of the standard itself.

The C++ grammar has the structure of a descriptive list,
several terms each of which may have several definitions.
We exploit that parallel structure
by representing C++ grammar rules with descriptive lists.

Grammar terms are represented denoted by
a var variable-phrase element.
When a grammar term is defined,
it is contained within by
a dt descriptive-term element,
and marked by dfn definition phrase element.
(The colon outside the dfn element
makes automatic indexing easier.)
Each substitution rule is denoted by
a dd descriptive-definition element.
The optional marker is denoted by
a sub subscript element
within the var element.
Literal code is denoted by the code phrase element.

Note that here we use the font element i
because it does not really fit any of the phrase markers
and because it makes searching for such uses easier.

While not part of the final standard,
rationale, editor's notes and notes to the editor
can also be represented this way.

Comments on the paper itself,
and particularly notes on work still to be done
can be marked the same way,
except using the b element instead of the i element.
This change enables rapid searching for unfinished parts of the document.

The library has special formatting requirements
for representing functions and their attributes.
Each function prototype is contained within
p class="function" element.
The attribute paragraphs are all contained
dl class="attribute" element.
Each attribute is labeled with a dt element
and has its body in a dd element.
For example,
the function definition

The first step in editing the standard is to quote the standard.
For that we use the blockquote element
with class="std".
Each quoted portion of the standard
must be preceeded by a paragraph
indicating where in the standard it comes from.
The section may be known from context,
but if not, it should be stated explicitly.
So, the quote appears as

Section 1.8 [intro.object] paragraph 5 says:

Unless it is a bit-field (9.6),
a most derived object shall have a non-zero size
and shall occupy one or more bytes of storage.
Base class subobjects may have zero size.
An object of trivially copyable or standard-layout type (3.9)
shall occupy contiguous bytes of storage.

and is encoded as

<p>
Section 1.8 [intro.object] paragraph 5 says:
</p>
<blockquote class="std">
<p>
Unless it is a bit-field (9.6),
a most derived object shall have a non-zero size
and shall occupy one or more bytes of storage.
Base class subobjects may have zero size.
An object of trivially copyable or standard-layout type (3.9)
shall occupy contiguous bytes of storage.
</p>
</blockquote>

One can show edits to a paragraph
by combining the quoting of the standard
with the delete and insert markup described above.
So, an edit appears as

Edit section 1.8 [intro.object] paragraph 5 as follows.

Unless it is a bit-field (9.6),
a most derived object shall have a non-zero size
and shall occupy one or more bytes of storage.
Base class subobjects may have zero size.
An object of trivially copyable, trivially movable
or standard-layout type (3.9)
shall occupy contiguous bytes of storage.

When deleting or inserting whole paragraphs or sections,
the del and ins elements
need not be used,
but the introductory text
should clearly indicate the edit.
In addition, the blockquote elements
use class="stddel" or class="stdins", respectively.
So, full paragraph deletions and insertions
appear as

Delete paragraph 12 of 2.14.5 String literals [lex.string].

Whether all string literals are distinct
(that is, are stored in nonoverlapping objects)
is implementation-defined.
The effect of attempting to modify a string literal is undefined.

After paragraph 12 of 2.14.5 String literals [lex.string],
insert a new paragraph.

All string literals are distinct;
their characters never share addresses.

The format of the HTML source itself
can improve its interaction with tools.

Starting each sentence on a new line
improves the stability of diff,
and hence of source code version control systems.
The same applies to
putting block-level elements on lines separate from live text.

When editing the source,
separating block-level elements
makes them more quickly identifiable.

More regularity in the HTML source
eases tools for converting HTML source to other forms,
like the LaTeX of the standard itself.

The C++ standard's papers are a good application
of literate programming.
[LPcom][LPwiki]
Particularly when a papers includes
normative declarations or sample implementations,
an automatic process for extracting the code from the paper itself
helps ease adoption concerns.

The essential idea is to identify code to be extracted with a distinct class,
e.g. "extract",
and then remove everything but that within those code elements.
That process is eased considerably
when all code text is on lines separate from other text.
Typically, this is accomplished with HTML of the form:

Within the code,
all HTML elements should be removed,
which enables links, phrase tags, and other markup
within the code.
Further, the critical HTML character entities,
&lt;, &gt;,
&amp;, and &nbsp;,
must be recognized and substituted.

For presentational purposes,
it is also helpful to identify the pre element
containing the code for extraction.
Again, use the same class as above, e.g. "extract".

The HTML files can contain either a single code file,
as in
N2648,
or multiple code files,
as in
N2427.
In the latter case, the multiple files
are actually generated from a single-file shell script.
The scripts in this paper follow that approach.

The script extract_code.sh
will extract the code from the HTML source of this paper.
Simply execute the resulting shell script to get copies of the scripts.

Once the paper is well structured and independent of the presentation,
we must address creating a readable presentation.
We encode that presentation in a style element
within the head element of the document.
(Alternately, we could create a standard location
for a separately read CSS file.)
The proposed style element
is style.hinc
in the Scripts section.

Color and contrast must meet specific technical requirements.
These are embodied in the
Web Content Accessibility Guidelines (WCAG).
[WCAG]
In particular, the intensity of the foreground and background
must be sufficiently different.
In addition the hue of the foreground and background
should be sufficiently different.
Web pages exist to test colors against the various criteria.
[Snook]
Further, consideration must be given to red-green color deficiency.

By far, the most common use of color in WG21
is to mark inserted and deleted text.
The normal convention
is to use red for deleted text and green for inserted text.
However, this color combination is problematic for red-green deficient readers.
Instead, we use magenta in place of red.
The added blue to the color makes it visually distinct from green.
The other typical problem with the colors chosen
is that they are too bright to provide good contrast
with the typical light background.
So, these colors need to be resonably dark,
but still light enough to be distinct.
The foreground colors
#005100 green and #8B0040 red-magenta
meet the criteria against a fairly broad range of light backgrounds.
Unfortunately, once we specify a foreground color,
we must specify a specific background color.
A white background reduces printing costs.

However, these colors alone are not sufficient
to identify inserted and deleted text.
For that we must add text decoration.
In particular, we follow existing convention
and mark deleted text with a line struck through
and inserted text with an underline.
Now, even in the absence of color,
deletions and insertions are distinct.

Earlier, we described the need to
mark whole quoted paragraphs of the standard as deleted or inserted.
We do this by changing the background for the paragraph.
In particular, deleted quotes
have a #FFEBFF light magenta background
and inserted quotes
have a #C8FFC8 light green background.
Regular quoted paragraphs of the standard
have a #F1F1F1 light grey background.
Extracted code
has a #F5F6A2 light yellow background.
Finally, each of these backgrounds
is surrounded by a thin, slightly darker, border.
This border provides an attractive edge to the quote.
More importantly,
when browsers ignore color, as in high-contrast mode,
they typically do not ignore borders.
These thin subtle borders become very visible when the color is lost.

The default formatting of tables
makes identifying table cells difficult.
To address this problem
and to be consistent with the formatting of many (but not all)
of the standard's tables,
we make several formatting choices.

Cell text is vertically aligned to the top,
which makes identifying rows easier.

Cell text is horizontally aligned to the left,
which makes identifying columns easier.
(Authors may choose to use right alignment for numeric columns.)

Cells are given a little bit of extra spacing.

Use a thin borders around the table, but not individual cells or the caption.

References to WG21 papers can simply use the N-number.
References to WG14 papers can simply use "WG14" and the N-number.
These references should link to the appropriate documents,
via HTML like the following.

This paper analyses the compatibility between the draft standards,
<a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3035.pdf">
N3035</a> and WG14
<a href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1425.pdf">
N1425</a> with respect to alignment.

All other references should be elaborated in a references section,
such as this one.
The purpose of the references section
is to enable following references from printed documents.
NOTE: when there is a reference section,
it is unclear whether references should
link to the reference section entry
or link directly to the referrent.

This script extracts code from an HTML source.
It can serve as the inverse function of the above,
is intended to extract more generally annotated code.
This script takes the class name as the first parameter.