6 Ways to Use Open Tools to Better Support Indian Languages

By Subhashish Panigrahi | November 6, 2016

Subhashish PanigrahiIndia is a large and a populated country that makes up a large base of Google consumers. So in recent years, Google's widened support of world languages for its various products has been a blessing. It has specifically helped Indian people grow their use of and participation on the Internet.

For one, Google Summer of Code helps students experiment with and build prototypes that enhance language-based software. Another way is through Google Translate, a web and app-based platform that provides machine translation from one language to another. It is predominantly maintained and serviced by volunteer contributions. Yet, there are more ways Google can support great inclusivity through the support of world languages; particularly people speaking South Asian-languages.

This article is a collection of ideas expressed by some Wikimedia contributors.

1. Update Google search

Google's Indian homepage currently has options for changing the interface and for searching in a few Indian languages, but as many as 13 Indian languages that are part of the 8th schedule of Indian constitution are missing.

2. Openly license Google Translate

More people could help improve Google Translate if it were available under a free license. Volunteers and many organizations have made their own sources available under open licenses, so it seems fair for Google to open their source under a free license for others to use.

3. Use Wikidata for Google Translate

Odia is an example of a largely used Indian language that is not supported on Google Translate. The reason could be that there are a lower number of translated strings on the Google platform for translators to contribute words and phrases. One way to improve the translation to many English words, and other languages, is to use existing translations on free knowledge platforms like Wikidata.

Wikidata is a sister project of Wikipedia and a free knowledge base that contains over 23,906,929 entries (at the moment) in multiple languages. The entries are structured and related, so this would be a great source. For instance, the entry about artificial intelligence on Wikidata not only has the commonly used word for AI in the native language but it also connects various other entries that are related to AI.

Note: Google Translate is now improved by Google Machine Neural Translation, so errors are expected to reduce by 80%, and the Spanish-English translation has scored 5.0 out of 6, where human translators score about 5.1.

4. Use Wikitionary for Google Translate

Google Translate currently provides pronunciation for both web and mobile devices for some of the languages—let's expand. The Indian language community is currently adding the meaning of words and their pronunciation to Wiktionary, a Wikimedia project and a free online multilingual dictionary.

5. Place a common forum on Google Translate

If outside contributions were allowed, a forum would allow them to discuss and gain consensus on language standards and grammar.

6. Use Wikisource to improve Optical Character Recognition

Optical Character Recognition (OCR) is a tool used for Google Drive to support scripts. Some use OCR to digitize freely-licensed (public domain, CC-BY, and CC-BY-SA licensed) books on Wikisource, which serves as a free library to digitize and preserve books. After digitization, these books are proofread to ensure that the resulting text is correct. The output of this process can be used by Google to improve their records and the OCR tool, which is not very effective for old, printed texts in Indian languages.

6 ways to use open tools to better support Indian languages was authored by Subhashish Panigrahi and published in Opensource.com. It is being republished by Open Health News under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0). The original copy of the article can be found here.