Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

philalethiac writes "Millions of Chinese language users will soon be able to access the Internet using Chinese script following a decision today by ICANN's Board of Directors to approve a set of Chinese language internationalized domain names."

2. "tie3" (http://zdic.net/zd/zi/ZdicE5ZdicB8Zdic96.htm) is a better alternative than "biao1". There's a more idiomatic Chinese jargon for "first post" which is prevalent among Internet users: "the sofa";)

I guess, until Slashdot enables the UTF character set like everyone else has for the past decade or so,

1. There will be some domain names that we can't link to on Slashdot

Slashdot did allow Unicode. Then things like like this [slashdot.org] happened. Blame the comment trolls for forcing Slashdot to use a whitelist of characters allowed.

As for domain names, from what I see, they start with a standard prefix (I think it's "xn--") followed by the Unicode codepoints. Just so they're compatible across all systems. Browsers can choose to display the codepoints, or, I'm seeing an option to not do that, so you can tell Paypal.com from xn--blahblabblah.xn--blah.

I imagine that Slashdot staff ruled that on an English board, Chinese characters are more useful for SJIS art and lameness filter evasion than for text.

It's only certain control characters that can cause mess like that.

Slashdot staff apparently thinks that the effort to track new control characters that get added to new versions of Unicode or implemented in operating systems' text layout engines isn't worth the additional ad revenue.

Ah, pinyin, the writing system that 1.3 billion people can write, and only primary school kids can read. To be able to write in a valid language and never be fear that it will be read by anyone important is liberating. For example:

Though on a more serious note, this is a little bit worrying. OK, ICANN is allowing Chinese domain names, this is no huge problem to me, since I can read and write Chinese anyway. But the Chinese will be pissed off when Japanese start using Kana and they are no longer able to enter the correct domain names to look up porn. I think this just screws the world all over in the long run, at least EVERYONE knows ascii.

Kana [wikipedia.org] developed out of man'yougana, the old "rebus" method of using Chinese characters for their sounds to spell Japanese words. Katakana were partial characters, and hiragana were cursive. Chinese has its own analogous system, called zhuyin fuhao [wikipedia.org], whose alphabet begins bo-po-mo-fo.

Woops, I meant juan3 (U+5377), I don't know why I used zh there, the input method usually picks that stuff up. I have never seen a taco in China so I just called it a Mexican roll, which is something they have at KFC.

I live in Beijing, but I was attempting to write CCTV Mandarin, rather than dialect, it doesn't surprise me if I screwed up one or two times.

I think you need a browser extension to make it show chinese characters, otherwise it just shows ascii characters. Not even sure why ICANN needs to be involved. Thailand does this without any ICANN involvement.

Though on a more serious note, this is a little bit worrying. OK, ICANN is allowing Chinese domain names, this is no huge problem to me, since I can read and write Chinese anyway. But the Chinese will be pissed off when Japanese start using Kana and they are no longer able to enter the correct domain names to look up porn. I think this just screws the world all over in the long run, at least EVERYONE knows ascii.

Don't you think that the intended target audience of domains with mational character names is likely to be just those that are able to read and write the language concerned? I can't see it screws anybody - no one forces you to make your domain name in any particular language.

I have heard conflicting information about this. I know the new ccTLDs for China (they approved two - traditional and simplified) are aliases for each other (resolve to the same sites), but are they also both aliases for the existing cn ccTLD or do they resolve to an entirely new domain? If they are separate, why did they choose to do it this way? It seems like it would only cause confusion.

Oh, and damn slashdot and it's lack of unicode support. It would be nice to be able to type the damn things when talking about them.

I would assume that deciding to do it separately would be the only logical decision. Traditional to Simplified mapping is not 1:1. There are a decent number of cases where two or more Traditional characters map to 1 simplified character. There are also other cases that are 2:2. Managing the transformations centrally would likely be a nightmare.

I know the new ccTLDs for China (they approved two - traditional and simplified) are aliases for each other (resolve to the same sites), but are they also both aliases for the existing cn ccTLD or do they resolve to an entirely new domain? If they are separate, why did they choose to do it this way? It seems like it would only cause confusion.

those ccTLDs work and there is no commercial money sucking rush to register new ccTLD domains. Looks like really good reason to me. Introduction of new international

Before the Hanzi Chinese CCTLDs were approved by ICANN, when the only way to use them was to install CNNIC's "Official Client-end CDN Software" in your computer, the registration of a.cn domain name with Chinese characters automatically gave you the version with the Hanzi Chinese CCTLD.

Or does the board have members that would be opposed to the casting of runes?

I suspect you would have to be authorized by your guild. Most likely have a minimum intelligence level. Probably a save or skill check is involved too. Ohhh, and don't pray to Thor before hand. He does not listen.

With all the non latin address character sets being approved I imagine there is a world of new opportunities which completely void all the "inspect the address bar" education which was pushed on the general public for so many years. ICANN has managed to turn the net into a pretty much anything goes place, almost every major company is practically extorted into buying the new extension flavour of the month to prevent spammers and fraudsters sending seemingly legitimate email and the general public is left completely confused with no guiding address principals.

There are some attempts to mitigate the problem, though you're right that it can be one. Some registrars are limiting the characters that can appear in their domain, and there's a push to make that more widespread. One approach is to limit to "local" scripts only, so e.g. Cyrillic or Latin in.ru, but no Telegu or CJK in.ru. That greatly limits the number of clashing pairs compared to allowing all of Unicode. Some registers also have policies on not permitting certain known clashes, such as allowing two domains to be registered that are identical, except for one having a Latin 'a' where the other has a Cyrillic 'a' (which look identical in most fonts).

Firefox and Opera will only display the internationalized Unicode name for TLDs that are whitelisted as having a "safe" policy on the subject, and will display the punycode for other domains. Here [mozilla.org] is Mozilla's current policy.

With all the non latin address character sets being approved I imagine there is a world of new opportunities which completely void all the "inspect the address bar" education which was pushed on the general public for so many years.

Seems like a good browser feature would be to highlight any non-ASCII characters in the address bar in a contrasting color, such as red or bright green. Then it would take only a minimal amount of additional education to understand that it means something is amiss, unless you're clearly expecting an address composed of foreign characters.

ICANN has managed to turn the net into a pretty much anything goes place

Good. Because it was either "anything goes," or it was "only what we say goes goes," with the "we" inevitably being someone with a different set of values or agenda than a substantial percentage of the human population.

So will browsers start supporting vertical address bars, [wikipedia.org] in addition to left-to-right and right-to-left?

Sorry to editorialize, but it's amazing how worked up the Americans and French get over the intrusion of foreign languages, considering they've done more than anyone to change how the rest of the world speaks and writes.

Americans don't get worked up over the intrusion of other languages, we just thing that everyone in the world should speak English as a 2nd language.

Sure, there'll be some Americans who will get worked up, but for the most part it is not a deep held belief.Its not like we require people to speak English; many government forms are available in many, many languages. Its not uncommon for larger cities to have areas where advertising is in Spanish or on of the many Asian languages.

You might have missed it for the last little while, but English is pretty much the defacto trade language anywhere you go. But no, people don't get worked up over the intrusion of foreign languages into English. English in itself is highly mailable, which is why it's considered a trade language. French on the other hand, gets bent out of shape because they see it as pollution of the language. They're all about purity.

While I don't like to raise too much sturm und drang about it, as a native English speaker I must still take some affront at the chutzpah with which these dirty foreigners waltz into our tongue, thinking they have carte blanche to sully our language.

Notice that everything he wrote in there was understandable to an English speaker. However many of those terms are straight from foriegn languages and have been adopted in to English. Strum und drang is German (means storm and stress), chutzpah is Hewbrew (means audacity more or less), waltz is Italian, as are most musical terms in English, and carte blanche is French (means blank check).

He's showing how English takes on new words from other languages all the time. There is no effort made to keep it "pure"

Wrong. QQ actually means rage quit, it's from battlenet when ALT+Q+Q immediately quited the match and program. It is a form of telling people to rage quit. It's origin is unfortunately usually mistaken as crying eyes by what people on battlenet call 'noobs'.

I'm just telling you what was relayed to me by the HK players in Everquest. They'd see an item and link it and sometimes say "QQ" on the end. This was generally when it was an item they couldn't use or something. When I got a hold of one of them who spoke English well enough to understand the somewhat odd question, he told me it was sad eyes. As in "I like this item but I can't have it."

In current usage in games, people say it to mean crying or bitching. The phrase "Less qq more pew pew," is popular, meanin

Being pedantic here, "Carte Blanche" actually means "white card" which was used to indicate a card with a royal seal but nothing else on it; allowing the bearer to get what he/she wanted from the royalty(sort of like a blank cheque on crack).

"Americans and French" "worked up" about "the intrusion of foreign languages"?

I'm looking up and down this whole thread and I don't see any evidence of what you're saying. Or maybe you're just anti-American/French and are projecting your own opinions onto others?

Besides, language is about communication. It doesn't matter how it gets done, just that it gets done. Sure the world has hundreds of languages around, but in today's world, english is the common language that binds the world together. If that hur

Hm, there was some of it ("what for / useless / why those people won't just learn our script") on the occasion of last such ICANN news (regarding TLDs in, among others, Arabic script IIRC)

Yeah, the language is about communication. And in todays world, there are lots of people for whom even Latin alphabet itself looks like, say, Georgian alphabet to you. Accidentally, they are often amongst those with most to gain, if they had less roadblocks in communication.

I am an American, and I can't seem to go two weeks without hearing some blowhard blaming the country's problems on "lazy immigrants who refuse to learn our national language". If anyone points out that most other countries aren't trying to enforce a sole national language, they usually claim that America is in a unique situation; that no other country on the planet has to deal with job-seeking immigrants "the way we do

This looks like a perfect opportunity to highlight this [pinyin.info] recent post at the Pinyin News blog, closely related to the issue at hand!
(Disclaimer: I'm not affiliated with the blog in any way, but as a former student of Japanese I can relate to the general message.)

However, my biggest concern is that the use of non-ascii characters in domain names breaks the whole International nature of the web, and imposes regional barriers. Your mail client and mail server software might not be too happy with you trying to send an e-mail to "joe@.jp" or "joe@.jp-r14k153opxc" in punycode. (Crap, it looks like slashdot does not accept international characters in comment submission, so you can't read this: "&#26085;&#26412;&#20154;".)

Remember that very few people have rendering and fonts for every written language on the planet, so most people will be cut off from many websites.) With the current IPv4 shortage, one can no longer reliable just use an IP address to access a specific website, e-mail address, etc., since a single IP address can host many domain names.

Personally I think that the best compromise solution would be to only allow non-ascii characters for domain names in different languages if there are submitted with a paired up romainization version that can be equally accepted for the same domain. So using my previous example, one could equally specify ".jp" in Japanese Kanji, ".jp-yn9d427hcvb" in punycode, or "nihonjin.jp" in Romanji. That way you can still cater to a local/regional audience, and still allow everyone else in the planet to reach you.

For those that argue that it does not matter if a domain name is only specified in a foreign language, if all of the hosted content is in the same foreign language forget about all of current International collaboration in Mathematical, Scientific, Engineering, Programming, and other fields. (You can write an entire math proof or software program using only symbols without a single human word.)

Even for individual one-on-one e-mail communications between people in different countries that are able to communicate in a common language this would still be a problem, since a large percentage of e-mail accounts are hosted with a user's local ISP, that in future may leave them stuck with a non-ASCII e-mail address that would cut them off from the rest of the world.

Requiring everything to be ASCII breaks with the whole international nature of the web by forcing everyone to use English alphabet characters.

Everyone has to use English ASCII characters for top level domains (*.com, *.jp, *.cn,...) and protocols (http, https, ftp,...), so everyone online in every country has to continue to use ASCII whether they want to or not, even after these International domain names are in common use.

BTW, I never said that everything had to be in English ASCII, just something like a domain name or e-mail address that is used to identify a website or person should be.

Circular reasoning. One could just as easily say that TLDs should be allowed to be non-ASCII as well (and who says they won't be?), which resolves the current dependencies on English. But that wouldn't make you happy because you use English and you want everyone else to use it too. Only for the domain or email address, you say... but those just happen to be the most important parts, right?

Your postal system example is just further evidence of systemic bias. Yes, you can send mail in Japan using addresses wr

You don't get it: gTLDs and ccTLDs are being translated (aliased) as well. When this is done, for, say, the Japanese user, there will be no need for any ASCII, whatsoever.
As for mapping to ASCII, all IDNs are mapped to punycode, which is ASCII, but it will be invisible.
And mixed scripts aren't allowed, so phishing fears are overblown; it won't be any worse than it is today.
IDNs should have been a part of the original DN structure, but better late than never. It's simply idiotic to have an entire website in Japanese, except for the DN.

All of the major programming languages I'm aware of use ASCII and have English overtones.

Perhaps it is time to accept that something useful to world communication is to pick a standard in terms of character set and so on and use that. That doesn't mean other characters can't be used locally, just that for global communication having a standard is a good thing.

I'd argue the same thing with language. I think it is a useful idea for everyone to learn a second language that everyone else speaks. Trying to get e

In practice, everyone does anyway. I don't know of any country where the people who use the internet haven't already developed at least some informal way of writing in the Latin alphabet, at least for short snippets like addresses. Many seem to prefer it even when alternatives are available--- for example, Facebook supports UTF-8 status updates, but my Greek cousins use Greek transliterated into ASCII more often than they use the Greek alphabet.

A huge number of people with non-Latin character sets will have access to the internet through their mobile phones, and they should not have to learn Latin letters to access local information, or have to switch input language to go to another site. Delivery of accurate information to farmers about the price of produce, for example, should not be restricted to those who know English, even if it was good enough for Jesus.

There is more to the internet than updating your facebook status and chat rooms.

Let me give you Afrikaans as a very simple example that I can type in/.:

There are far more than 5 vowels, so the rest are written using diacritics (such as ê) and pairs of vowels (such as ou).

In order to distinguish between (for example) oe as a single vowel and oe as two separate vowels, we indicate the syllabic breaks with yet more diacritics. So "hoër" means higher, while "hoer" means "whore". There is absolutely no other way to write it, because unlike the Germans we cannot

Yeah and requiring XHTML & HTML breaks the ability of people to setup web site formated in PDF (or Flash:-P ).Standards aren't necessarily here to help every single crazy idea around.

Currently, the roman alphabet is the single common thing that you are bound to see on every single computer and other input-equiped machine connecting to the net.If you want to be still accessible by everyone else, and not only the people having the proper font/keyboard layout/etc. combination installed, you need to suppor

So long as ONE character set is required, then it works.It was the latin charset, it may as well have stayed that.Now, we'll have places where you simply cannot type in the domain name. Hurrah for allowing china's censors another easy way to cut off access to anything else!

Websites probably aren't so much of a problem because if you can't read the script the URL is written in, you probably can't read the rest of the site either. If they want an English language version of the site, they will probably put it on a site with a Roman URL.

For example if you want to visit Cardiff City Council's website, you can visit either www.cardiff.gov.uk (english) or www.caerdydd.gov.uk (welsh).

As a person who can read/write Japanese(similar to, but a bit different from Chinese) characters, I don't know why ICANN thought this was a good idea. It's not like the actual contents of pages had to be in Latin characters, so "Allowing use of other languages" is not really an issue. Only the address had to be in Latin characters.

Having all internet users use the 26 (x2 for capitals) letters of the Latin charset and 10 numbers is a much, much simpler than having everyone try to learn all the letters of all the character sets out there.

This is going to make administration harder.

If you started getting hacking attacks from.com, would you even know how to type that into your firewall? If you got an email from @.com, do you think you could describe the address over the phone to a colleague? From the preview, it appears Slashdot is filtering out Japanese characters I used for the addresses. The above examples would be tokyo.com and shujin@osaka.com if they were forced to be in latin. And that's something that's usable by both Japanese and foreigners, whereas the Japanese-character addresses are for 'Japanese only'.

> I hope ICANN reconsiders and returns to latin+numbers only addresses.

ICANN is in the business of hyping domain name sales and cashing in on it. Look at their TLD selloff. Applying needs a $185K non-refundable "application fee" which ICANN claim they need to cover their oveheads. Justified if they read applications while drinking Dom Pérignon from a gold slipper. The only way to convince ICANN not to do something is to convince them it won't make them money. Speculators and squatters are still out

Yup - by doing this the people that already own acmeco.com, acmeco.net, acmeco.org, acmeco.edu, acmeco.co.uk, acmeco.tv, acmeco.biz, acmeco.name, and the 387 other TLD variations on this have to also go out now and buy the same thing but in every character set in use anywhere on the planet.

Don't worry - I'm sure they'll announce a waiting period so that "legitimate" domain holders can buy their 10k new domain names each before the squatters take them.

Ya. To me this just smacks of people whining about their language and culture in a rather meaningless way. The French are known to do this, they want to have French words for everything, don't want to adopt foreign terms. Very silly, and it happens despite their efforts. This is just more of the same like that.

I think people forget that using ASCII isn't only because computers were invented in English speaking countries. It is also because it is a reasonably small and distinct character set. You get languag

Sorry it took me a while to reply, I did not notice your post until now.

Oh, yes, THAT makes you an expert and an authority. Carry on.

I was pointing out that I knew a language with a radically different character set, as most English-speakers who know a different language know a western European one with Latin characters and a few modifiers. I am trying to point out the difficulties most people would have with very different character sets, and ICANN also approved use of Japanese addresses so I think it was

The article is a verbatim repost of a verbatim repost of the ICANN PDF press release. Including down to having left out the ccTLDs from their list of "new IDN country code top-level domains (ccTLDs) and the associated organizations"

The original PDF linked to the meeting minutes collection on the ICANN site, a link that was lost by the reposting process.

The ICANN Meeting Minutes themselves are quite clear on what has been done:http://www.icann.org/en/minutes/resolutions-25jun10-en.htm#2

I have to say I don't like this one bit. It pretty much guarantees that sites with Chinese domains are essentially blocked off to the rest of the world. With Pinyin, at least a non-Chinese can visit a Chinese site without too much difficulty. Good luck trying to enter an address for a Chinese site given that you wont even have a clue how type in the address. And I'm curious to know how they will deal Japanese or Korean input. I mean, if someone in Japan uses Japanese input methods to enter the same characte