Re: IDN hostname resolution in NetBSD

On 27 May 2010, at 3:59 AM, Johnny Billquist wrote:
> Personally, I stay far away from UTF-8 whenever I can. It's not a good
> solution, the only problem being that it's now the standard, so no other
> better solution is going to come along. :-(
> (Actually, Unicode is part of the problem, but that is here to stay as well.)
Hmmm. I don't particularly want to get into a character set or encoding war,
here, but I've been working extensively with mutli-lingual systems for over a
decade, now, and I find the increasing adoption of Unicode to be a welcome
development. It's finally possible to operate seamlessly in multiple languages
at once, and I only wish there were more universal adoption.
I won't argue about the "correctness" or "bestness" or Unicode; it largely
doesn't matter. We needed a universal character set, and now we've got one. Its
existence as a standard is, in fact, its most useful property.
UTF-8, though, is a fantastic thing. It's very well thought-out and has several
technical features that make it really useful. Combined with the feature that
US-ASCII, the ISO 8869-* character sets (including 8859-1, aka Latin 1), and
Unicode all have the same first 128 code points, UTF's encoding of the first
128 code points of Unicode using the exact same bytes means that any US-ASCII
text can be correctly interpreted as UTF-8. This makes it very easy to take
many systems that were implemented only understanding US-ASCII and convert them
to full Unicode support without breaking backward compatibility. In some cases,
code doesn't have to change at all.
All that aside, I really just wanted to make a point about why Chris's attempts
to set an IDN hostname are important. Now that the root name servers contain
some IDN zones, there are real-live non-US-ASCII domain names in the wild right
now. (Well, or punycode domain names, if you prefer.) For instance:
http://ÙØØØØ-ØÙØØØØÙØØ.ÙØØ/ (that's Arabic for <Ministry of
Communications>.<Egypt>)
Of course, even today, that web server could live on NetBSD, since (a) it's
probably using virtual hosting (in fact, that server's PTR record is
'mcit.gov.eg'), and (b) it could have its hostname be the punycode version of
that name even if it weren't.
Still, it will be more and more desirable to have hostnames (which should,
really, match the host's name in DNS) be IDNs, and it would be really good if
NetBSD could support this.
The in-kernel hostname, ideally, should be in something like Unicode (as UTF-8
would be nice), and only DNS-resolving software would need to worry about the
conversion to punycode. Of course, there's a lot of DNS-aware software out
there. How much of it is also aware of the local hostname? Certainly, MTAs are.
And, of course, this also brings up all the issues of what encoding the user's
locale is set to, the other issues Chris brought up, and no doubt yet others.
It won't be a quick fix, but will take some careful planning and coordination.
- Geoff