---------- Forwarded message ----------
from deccan herald/ Nov 27
Breaking the language barrier in IT:
IIT Bombay develop new multilingual search engine
From Devika Sequeira
vicki at goatelecom dot com
DH News Service
PANAJI, Nov 26
Move over Google. A team of researchers from the Indian Insititute of Technology Bombay,
says it has developed a search engine for the internet that is both multi-lingual as well as
meaning specific, giving it a broader applicability and greater accuracy than existing
models.
"Our search engine eliminates the language barrier and its results are much more accurate
than any other techniques used," says Dr Pushpak Bhattacharya, Prof Computer Sciences and
Engineering Department, IIT Bombay. Using Universal Networking Language (UNL), "the model
has integrated the user"s language requirement with the knowledge the user seeks," he points
out.
In a paper to be presented at the ongoing International Conference on Universal Knowledge
and Language here, Dr Bhattacharya and his team of students, Sarvjeet Singh, Tushar Chandra,
Upmanyu Misra and Ushhan D Gundevia argue that their search engine retrieves only the
knowledge that is relevant and attempts to bridge the language gap by using an underlying,
structured language as a backhand translator. "As far as we know, we are the first to employ
this technique," they say.
Google, widely believed to be the best search engine, is restricted only to English.
According to an estimate by the World Wide Web, English language content makes for about 80
per cent of the trillion and trillion bytes of textual information on the internet. Though
other language content is also catching up rapidly -- specially Chinese and South Asian
languages -- the digital divide between nations and people is still huge.
It is in the backdrop of this that the United Nations began the UNL project in 1996. The
universal networking language is simply put, an electronic language. It uses an EnConverter
software to automatically convert natural language text into UNL. Thirteen languages so far,
including Japanese, Chinese, Korean, Indonesian, English, Hindi, Marathi, Arabic, Italian,
Russian, French, Spanish and Portuguese have deconverters in place that automatically
translates them to other languages. With a lakh concepts in place, English boasts of the
largest wordnet, so far.
IIT Bombay which is in the process of developing translation software for Hindi, Marathi
and Konkani has developed 15,000 concepts so far for Hindi, says Bhattarcharya. He points to
the immense extension of the reach of the internet, once computer translations of languages
become availbale at the click of the button./ends