Additional named entities for HTML

W3C Working Draft 25-Nov-96

Status of This Document

This draft is work under review by the W3C HTML Working Group,
for potential incorporation in an upcoming version of the HTML
specification, code named Cougar. Please remember this is subject
to change at any time, and may be updated, replaced or obsoleted
by other documents. It is inappropriate to use W3C Working Drafts
as reference material or to cite them as other than "work in
progress".

Please send detailed comments to
www-html-editor@w3.org.
We cannot garantee a personal response, but summaries will
be maintained off the Cougar page. Public discussion on HTML
features takes place on www-html@w3.org. To subscribe send a message to
www-html-request@w3.org
with subscribe in the subject.

Abstract

This specification extends HTML to support additional
named entities for all characters in ISO 8859-1, all characters
representable by glyphs in the Adobe Symbol font, entities required
for Internationalisation - in particular for bidirectional text - and
characters from that portion of CP-1250 which lie outside the
character repertoire of ISO 8859-1.

Additional named entities for ISO 8859-1

All new entities in this section produce characters which can
already be represented by all conforming HTML 2.0 User Agents (UAs),
but currently in a less convenient form. For example they allow typing
&divide; instead of &#247; to obtain a division sign
(÷). The new named entities conform to the Proposed
Entities section of the HTML 2.0
specification.

Adding support for these entities to a UA
merely entails recognising the entity names and converting them to
characters which are within the repertoire of ISO-8859-1.

Entities in this section refers to characters which are outside the
repertoire of ISO 8859-1 but within the repertoire of
ISO-10646. These characters may be ustd in an HTML document in
one of three ways:

For documents using any character encoding, by inserting the named
entities defined here. For example, &sum;

For documents using any character encoding, by inserting the
numeric entity references used to defined the named entities. For
example, &#8721;

For documents whose character encoding has a repertoire that
includes these characters (for example, UTF-8), by directly inserting
the character.

Adding support for these entities to a UA may be achieved by
supporting full ISO-10646 or by other means. Display of glyphs for
these characters may be obtained by being able to display the relevant
ISO-10646 characters or by other means, such as internally mapping the
listed entities, NCRs and characters to the appropriate position in
some font which contains the requisite glyphs.

Note: This entity set contains all the letters used in
modern Greek. However, it does not include Greek punctuation,
precomposed accented characters nor the non-spacing accents (tonos,
dialytika) required to compose them. There are no archaic letters,
Coptic-unique letters, or precomposed letters for Polytonic Greek.

The entities defined here are not intended for the
representation of modern Greek text and would not be an efficient
representation; rather, they are intended for occasional Greek letters
used in technical and mathermatical works.

Special named entities

This section adds named entities for escaping markup-significant
characters (these are the same as those in HTML 2.0 and 3.2), for denoting
spaces and dashes, and for disambiguating bidirectional text (these
are the same as those in RFC 2070).

Entities are also added forthe remaining characters occuring in
CP-1252 which do not occur in the HTMLlat1 or HTMLsymbol entity sets.
These all occur in the 128 to 159 range within the cp-1252 charset. These
entities permit the characters to be denoted in a platform-independent
manner.

Note: some authoring tools on the MS Windows platform
incorrectly generate numeric character references (NCRs) in the range
&#128; to &#159; which are illegal Unicode characters. NCRs
always refer to the 'Document Character Set' which, for HTML, is
always ISO-10646 (which has the same characters as Unicode). NCRs do
not refer to code positions in the charset used to
transmit the document. Using the entity names defined here avoids such
errors by refering the characters to their Unicode values.

Adding support for these entities to a UA may be achieved by
supporting full ISO-10646 or by other means. Display of glyphs for
these characters may be obtained by being able to display the relevant
ISO-10646 characters or by other means, such as internally mapping the
listed entities, NCRs and characters to the appropriate position in
some font which contains the requisite glyphs.