Friday, October 30, 2009

On the 40th birthday of the Internet last week, The Internet Corporation for Assigned Names and Numbers (ICANN) formally announced that there would now be domain name support for non-latin character URLs. This concept, called international domain names or IDNs, will allow URLs composed from letters of scripts of languages such as Korean, Chinese, Hebrew, Arabic and Hindi.

A little digging on Wikipedia about IDNs reveals that the underlying implementation is based on translating unicode names into DNS-compatible (ascii) URLs and visa versa in order to keep the current DNS system functional. This makes the system backward compatible with currently deployed name resolution infrastructure. In fact most of the translation to/from the non-Latin scripts will be done on the users' browsers.

But what does this mean for the fabric of DNS address space and the web?

Dilution of Latin namespace (?) Will we see some dilution in the value of address real estate? For example will http://www.doctor.com become less valuable because folks in Germany can now remember it instead as the more meaningful http://ärzt.com (Ärzt is German for medical doctor)? And what of those tens of thousands of domain names registered from different languages in Latin script (e.g. http://naukri.com in India. Naukri in Hindi means job).

The registration rush Initially, web content providers will scurry to buy up non-Latin names. But this will be more important for those content providers who do not have a global brand-name, or have a brand-name that defines their product or service. For a content provider like doctor.com, it will make sense to buy the synonyms of "doctor" in other languages, in addition to the spelling of "doctor" in the other languages. On the other hand, Microsoft.com will only buy up the spelling "Microsoft" in the languages/scripts becoming available through IDNs. At the very least, I forsee most businesses re-evaluating their namespace position on the web.

Security and phishing Completely unrelated characters in different scripts can look the same to the human eye. This means that users can be tricked into thinking that the address displayed in the address bar points to a legitimate page when in fact it points to a phishing page. It may be prudent for businesses to be aware of these security vulnerabilities of their URLs and perhaps register "similar looking" URLs in other languages/scripts proactively.

Impact on search engines Search engines are known to weigh in address name strings in their ranking algorithms. This may need some re-thinking. At the very least, some search engines may need to use automated translators to link up semantically similar web pages irrespective of how the address space links different copies of the same information in different languages/scripts.