Open Source Project Brings 11th Century Kannada Verses Online

Vachana sahitya is a form of rhythmic writing in Kannada poetry that evolved in the 11th century C.E. and flourished in the 12th century as a part of the Lingayathamovement. More than 259 Vachanakaras (Vachana writers) have compiled over 11,000 vachanas. 21,000 of these verses which were published in a 15 volume set, “Samagra Vachana Samputa,” by the Government of Karnataka, a state in South West India, have been digitized. Two Wikimedians along with Kannada linguist and author O. L. Nagabhushana Swamy are involved in the Unicode conversions, corrections and writing the preface for these verses. The entire work is now available as a standalone project calledVachana Sanchaya and ready to enrich Kannada WikiSource.

This project was started a year ago when Kannada Wikimedian Omshivaprakash was trying to help Professor O. L. Naghabhushana Swamy and Kannada author and publisher Vasudhendra to easily access the vachana (verses) of Vachana Sanchaya. Swamy had challenges in using publicly available content on Vachanas since the data was in ASCII and searching text was a huge problem. Pavithra Hanchagaiah started helping to collect information about about vachanas and document them into Unicode by writing scripts to customize open source software to convert the Kannada fonts from ASCII into Unicode.

After further discussions, it was decided to get thousands of vachanas into a database, making them easily searchable with an index. This required us to build a platform on which this could be done. The fruits of our labors will help linguistic researchers and students as well as the public at large, anybody who’s interested in reading and studying Vachana literature.With this idea, Omshivaprakash started designing the model and his colleague Devaraju started building it. In the meantime, Pavithra was running various scripts to fix errors in the conversion of the ASCII text to Unicode, confirming that the data was ready to be consumed by the modules developed for the concordance. We spent weekends and holidays executing this project from home and would sync up once in a while online.

With constant feedback and guidance from Mr. Swamy and Vasudendra, we learned how a concordance of text is used by researchers and what would make it easier for them to do their research. Omshivaprakash worked on the architecture of the platform, decided the infrastructure requirements and managed the entire project. Free and open source software technologies were used for keeping the platform active. Pavithra was involved in providing critical hacks for digitization and offered valuable input through suggestions, feedback and Q&A.

Working system

At present, the system has around 200,000 unique words in the repository. It was an extensive learning process, as we used our free time to solve real time issues. Moreover, it was a work of the Kannada language that needed quick attention. Vachana Sanchaya is meant to be more than just a repository of the text online; it’s meant to be a tool for researchers.

For example, as a user searches the words on our system, he or she can see who has used the word in which Vachanas. To improve readability, the searched text string is highlighted in each Vachana that is displayed. To repeat the search for a specific Vachanakaara, the user needs only to click on his or her name on the graph provided on the result page. We have used the MediaWiki jquery-ime input tool architecture that helps us provide the user with the ability to directly enter Kannada text in Unicode for a search.

Public Response

We are glad to see people accessing vachanas from our Facebook, Twitter and Google+ channels. Thousands read them every day and it has become a part of many people’s daily routine. There have been more than 50,000 page views on social networks and 500,000 page views on our site in the first few months after our platform’s public launch. Some of the most commonly searched Kannada words are “ಕರ್ಮ”(Karma en: Work/Deed), “ಸತ್ಯ” (Sathya en: Truthfulness) and “ನದಿ” (River).

Our system is extensible with respect to adding new features. We have a review desk for researchers to help with the review of content. Later we will be adding required references to Vachanas from various research works on this literature. The content is available for the public through OpenData API and will be distributed in the public domain through WikiSource once the review work is complete. This will open up the system for students, developers, researchers and anyone interested in working to build linguistic tools for Kannada and other Indic languages.

This system will evolve so it can be used for other literature projects. Vachana Sahitya will further help us to initiate Natural Language Processing (NLP) projects if more researches get together to tag the words, glossary, etc. We can also add various language tools such as a spell checker and grammar checker through crowd-sourcing development. The forthcoming project under the “Kannada Sanchaya” are Sarvagnana Vachanagalu and Dāsa Sanchaya which are already in the pipeline. Our idea is to extend this platform to include works from antiquity (Vyasa, for example) to the early 20th century (e.g., Muddanna) and possibly even include contemporary literature that’s available in the public domain.