David G. Durand wrote:
> I have yet to hear anyone offer an argument as to why a character
> string describing missing data (an unknown glyph) is inferior to a number
> describing missing data, especially when the web infrastructure does not
> provide a convenient way to make the private arrangements needed for
> private-use to work (funny property of a publishing medium, isn't it?).
OK, lets decree that all names should be written in Chinese, since they
are
the most numerous people and characters. They have a phonetic system,
so we can spell out any English words in bopomofo: in the West will only
have to
learn a few dozen characters, which is trivial, and we can figure out
what a name means by looking in some (online) list with representative
glyphs :-)
Less flippantly, even simple English names are not clear. It took me a
long time to realise Americans seem to mean what we call "hash" by
"pound".
So I don't have any confidence that English names are much use to most
people in the world. That is the first reason why names are inferior.
The second reason is that there are so many characters that giving them
identifiers that also describe them means some of those identifiers must
get very complicated, unless you adopt a Polish notation (semi-acronym
contractions)
like Microsoft advocates for C code. In which case you don't have a name
anyway, since you need to know the contraction. The Omega thread
earlier
shows how deceptive it can be to use meaningful-looking identifiers.
The third reason is that the way people seem to like to handle lots of
characters is to use a "Keycaps" utility applet, like Windows and Macs
(and FrameMaker on UNIX) provide. In this case, the user doesn't care
what the identifier is. So the main concern is for machine efficiency
rather than readability, IMHO, since users increasingly won't edit in
dumb text editors. So both our points may be moot (but my point is
less moot than yours :-).
You mention 'unknown glyphs', and I think you really mean arbitrary
glyphs not specifiable by ISO10646+markup+stylesheet/DSSSL+locale. I
have
been assuming that XML was targetted at 'resolved' (closed-system) data,
not (open-system) system-independant data. In a closed-system,
resolution
of arbitrary foreign glyphs must be a transparent function of the
application,
not an XML parser function that requires any user's intervention.
Which sounds to me like something better handled in-place in element
markup
rather than entity markup, given the rarity of the glyphs, and in the
spirit of element-set-less simplicity from HTML (or is that spectre?)
--
Regards
Rick Jelliffe email: ricko@allette.com.au
_______________________________________________________________
Allette Systems (Australia) email: info@allette.com.au
Level 10, 91 York Street www: http://www.allette.com.au
Sydney 2000 NSW Australia phone: +61 2 262 4777
fax: +61 2 262 4774
_______________________________________________________________