Monday, January 7, 2013

Last week I spent some time working on WikipediaHI activity for Sugar Desktop Environment. I must say it is one of the awesome activities I have come across. The best part is that it can serve you with data in offline mode. That is even if don't have internet connection which is otherwise required to access Wikipedia online, then also your WikipediaHI activity will serve your purpose.

There are lot many developers and contributors who are working in collaborative form on such awesome stuff who continuously inspire you to take up new things and create something that can be used by others in the world. Sugar developers and contributors are epitome of such group.

I came across few of such developers, Anish Mangal and Gonzalo Odiard, two of them whose contributions are significant for Sugar. I took up the task of creating WikipediaHI using Wikipedia dump for Hindi available for free. I followed the steps specified on this page[ hosted by Gonzalo] for creating Wikipedia activity in your own language.

4) Processed the dump using page parser:../tools2/pages_parser.pyThe result of this operation will generate these files:hiwiki-20121225-pages-articles.xml.linkshiwiki-20121225-pages-articles.xml.page_templateshiwiki-20121225-pages-articles.redirectshiwiki-20121225-pages-articles.templates

5) Then you can include selective articles or all articles from this dump to your activity by using this command:../tools2/make_selection.py* Make sure you have favorites.txt and blacklist.txt filled with appropriate keywords.Now if you want to include all articles use this command:../tools2/make_selection.py --all
6) Then proceed to create the index for these articles:../tools2/create_index.py

7) In order to test the index created in previous step you can use this command:../tools2/test_index.py
8) Next step is to expand the templates of articles :cd .../tools2/expandtemplates.py hi

10) Download the images for the articles you selected:cd hi../tools2/download_images.pyif you want to download the images for pages you selected in previous step:../tools2/download_images.py --all

11) Create files specific to language:(a)activity/activity.info.lang : activity info file for you language activity(b)activity/activity-wikipedia-lang.svg : activity icon for your language(c)activity_lang.py : activity file for your language(d)static/about_lang.html : about page for wikipedia in your language.(e)static/index_lang.html : index page for wikipedia in your language. This is the page displayed when activity is launched. So its important for you to know the articles included in the search.db ( generated when index is created) for you to create the index page.

I went through the search.db file to identify the articles present in it and create the index page accordingly.This gave me an idea to write some script that can generate index page(part or whole) to be used as home page for activity using search.db[ Stay tuned for next blog on this idea] Here you go.. you can see WikipediaHI

On launching this, you can see the index page listing the articles you can view offline using WikipediaHI

I must thank Gonzalo for his amazing help and guidance in getting this done. I have to mention here that Wikipediachanged its XML format in their dumps which resulted in error when I was creating the index. I took Gonzalo's help to get it resolved.Thanks to Anish, who motivated me to pick this up and guided me to complete it.Thanks guys !! :D