Message-Id: <s1f76810.018@videodiscovery.com>
Date: Thu, 25 Jul 1996 12:21:55 -0800
From: Jim Taylor <JHTaylor@videodiscovery.com>
To: www-html@w3.org
Subject: ISO 8879 diacritical marks as HTML character entities -Reply
>>> Chung-Chieh Shan <t-chungs@microsoft.com> - 7/25/96 12:05 AM >>>
>I am interested in the list of character entities that are/will be
>included in HTML 3.2. In particular, I am working on
computerization of
>several Taiwanese languages, the romanization of which requires
>diacritics to be placed over letters such as "m" and "n". Since
there
>are already entities like &acute; and &grave; defined in
>ftp://ftp.ifi.uio.no/pub/SGML/ENTITIES/ISOdia, I suppose the only
>question is whether these entities will be included in HTML 3.2 (I'm
>actually not absolutely sure that they haven't been included in
previous
>versions; I'd be very happy if they have), and -- if they will --
>whether any specific rendering behavior is to be specified by HTML.
If
>it is HTML's responsibility to specify rendering behavior for these
>entities, I think the logical way to proceed is to follow Unicode's
>placement of non-spacing marks, i.e., use m&acute; (rather than
>&acute;m) for m with acute above, and so on.
Entities for these diacriticals have not been in any HTML standard,
and are not in the experimental Cougar document[1]. However, these
characters are included in the ISO 8859-1 repertoire, so you can
directly use characters for the diacriticals, which should work in
any browser correctly supporting 8859-1. If you want non-spacing
diacriticals you could use numeric character references (from
Unicode) but most browsers won't support them.
acute: character 180 (&#180;)
acute, non-spacing: &#57351
grave: character 96 (&#96;)
grave, not-spacing: &#57350
Unicode also includes glyphs such as M with acute accent (&#7742;),
but it's not likely you'll get many browsers that support that
either.
You could propose that the SGML entities for diacritials (ISO
8879:1986//ENTITIES Diacritical Marks//EN) [2] be added to HTML, but
most of these are already included in the 8859-1 set and supported by
decent browsers. I.e., why write &grave; when you can write `?
<!ENTITY acute SDATA "[acute ]"--=acute accent-->
<!ENTITY breve SDATA "[breve ]"--=breve-->
<!ENTITY caron SDATA "[caron ]"--=caron-->
<!ENTITY cedil SDATA "[cedil ]"--=cedilla-->
<!ENTITY circ SDATA "[circ ]"--=circumflex accent-->
<!ENTITY dblac SDATA "[dblac ]"--=double acute accent-->
<!ENTITY die SDATA "[die ]"--=dieresis-->
<!ENTITY dot SDATA "[dot ]"--=dot above-->
<!ENTITY grave SDATA "[grave ]"--=grave accent-->
<!ENTITY macr SDATA "[macr ]"--=macron-->
<!ENTITY ogon SDATA "[ogon ]"--=ogonek-->
<!ENTITY ring SDATA "[ring ]"--=ring-->
<!ENTITY tilde SDATA "[tilde ]"--=tilde-->
<!ENTITY uml SDATA "[uml ]"--=umlaut mark-->
-----
[1] http://www.w3.org/pub/WWW/MarkUp/Cougar/HTML.dtd
[2] ftp://ftp.ifi.uio.no/pub/SGML/ENTITIES/ISOdia