Internationalized Top Level Domain Names in Indian Languages

ICANN (Internet Corporation for Assigned Names and Numbers) has approved four additional proposed Indic TLDs (top level domain names), in Malayalam, Kannada, Assamese and Oriya languages. The TLDs are yet to be delegated to NIXI (National Internet exchange of India). While Malayalam, Kannada and Oriya will use their own scripts, Assamese TLDs will use the Bengali script.

The news title says “domain names” and the report talks about TLDs. For many people domain name is simply something like “google.com” or “amazon.in” etc. So people may misinterpret the news report as approval for domain names like “കേരളസർവ്വകലാശാല.ഭാരതം”. Many people asked me if that is the case. We are going to have such domain names in future, but not yet.

I will try to explain the concept of TLD and IDN and the current status in this post.

The Internet Corporation for Assigned Names and Numbers (ICANN) is a non-profit organization which takes care of the whole internet domain name system and registration process. It achieves this with the help of lot of domain process and policies and domain registrars. In India NIXI owns the .in registration process.

A domain name is a string, used to identify member of a network based on a well defined Domain Name System(DNS). So, “google.com”, “thottingal.in” etc are domain names. There are dots in the domain name. They indicate the hierarchy from right to left. In the domain name “thottingal.in”, “.in” indicates a top level or root in naming and under that there is “thottingal”. If there is “blog.thottingal.in”, “blog” is a subdomain under “thottingal.in” and so on.

The top level domains are familiar to us. “.org”, “.com”, “.in”, “.uk”, “.gov” are all examples. Out of these “.com”, “.org” and “.gov” are generic top level domains. “.in” and “.uk” are country code top level domains, often abbreviated as ccTLD. “.in” is obviously for India.

In November 2009, ICANN decided to allow these domain name strings in the script used in countries. So “.in” should be able to represent in Indian languages too. They are called Internationalized country code Top Level Domain names, abbreviated as IDN ccTLD.

ICANN also defined a fast track process to do the definition of these domains and delegation to registrars so that website owners can register such domain names. The actual policy document on this is available at ICANN website[pdf], but in short, the steps are (1) preparation, (2) string validation and approval, (3) delegation to registrars.

So far the following languages finished all 3 steps in 2014.

Hindi: .भारत

Urdu: بھارت

Telugu: .భారత్

Gujarati: .ભારત

Punjabi: .ਭਾਰਤ

Bengali: .ভারত

Tamil: .இந்தியா

What this means is, NIXI owns this TLDs and can assign domains to website owners. But as far as I know, NIXI is yet to start that.

And the following languages, just got approval for second step — string validation. ICANN announced this on April 13, 2016. String validation means, Requests are evaluated in accordance with the technical and linguistic requirements for the IDN ccTLD string(s) criteria. IDN ccTLD requesters must fulfill a number of requirements:

The script used to represent the IDN ccTLDs must be non-Latin;

The languages used to express the IDN ccTLDs must be official in the corresponding country or territory; and

A specific set of technical requirements must be met.

The languages passed the second stage now are:

Kannada: .ಭಾರತ

Malayalam: .ഭാരതം

Assamese: .ভাৰত

Oriya: .ଭାରତ

As a next step, these languages need delegation- NIXI as registrar. So in short, nothing ready yet for people want to register domain names with the above TLDs.

We were talking about TLDs- top level domain names. Why there is a delay in allowing people to register domains once we have TLD? It is not easy. The domain names are unique identifiers and there should be well defined rules to validate and allow registering a domain. The domain should be a valid string based on linguistic characteristics of the language. There should be a de-duplication process- nobody should be allowed to take a domain that is already registered. You may think that it is trivial, string comparison, but nope, it is very complex. There are visually similar characters in these scripts, there are rules about how a consonant-vowel combination can appear, there are canonically equivalent letters. There are security issues[pdf] to consider.

Before allowing domain names, the IDN policy for each script need to be defined and approved. You can see a sample here: Draft IDN Policy for Tamil[PDF]. The definition of these rules were initially attempted by CDAC and was controversial and did not proceed much. I had reviewed the Malayalam policy in 2010 and participated in the discussion meetings based on a critique we prepared.

ICANN has created Generation Panels to Develop Root Zone Label Generation Rules with specific reference to Neo-Brahmi scripts. I am a member of this panel as volunteer. Once the rules are defined, registration will start, but I don’t know exactly when it will happen. The Khmer Generation Panel has completed their proposal for the Root Zone LGR. The proposal has been released for public comments.