Sunday, February 28, 2010

I have explained about the Sahana project I my previous blog post. So today I'm going to explain about the Sahana OCR project which is an sub module in the main Sahana project. You can refer my research paper about the extension I did for this project.

Sahana Ocr project was done to automate the data entering process to the system and it was developed on .net framework using Visual C++. It was developed by many volunteers as well as gsoc contributers as Omega and Gihan in past two years. I have selected this project as my level 3 project and did some improvements to the existed system to improve the character recognition of the module.

After a disaster situation we have to collect the data about the victims and manage them usingSahana as I mentioned in my earlier post. So we can download the form template by the Sahana server, print the forms in to hardcopies, distribute the forms to the victims, ask them to fill the forms and collect the forms back. Then we have to enter those data manually to the Sahana system. But in the case of a large disaster the manually data entering process is difficult and very tome consuming. So Sahana Ocr project is developed to automate that data entering process by using optical character recognition (OCR) technology.

This is a sample of the forms distributed to the victims and you can refer the wiki abou this module to get a better understanding about the forms and the xforms. sahana_xform

So in this we have to provide the scanned image of the filled formand a xml file called xform which have the coordinates of the data fields in the template if the form to the Sahana Ocr module. Then the module sent the two files to the Form_processor which is the manager class of the system. First Form_processor do the rotation compensation to the image. The system uses five back squares at the form to identify the right side of the form. Then it validate the form image with the xform whether the correct xform coordinates are in the form image. If it is not the system gives an error.

After validating the image the image was sent to the the image processor and the image processor finds the Input areas of the form then data fields within them and finally letter boxes according to the xforms coordinates. The data fields, input areas and letter boxes are explained in the above image.

The image processor was developed using the OpenCv libraries and it worked greatly in this project. You can find more abou the OpenCv libraries by referring here.

Then those letter boxes are sent to the Ocr class which is capable to recognize the character in that letter box.The initial Ocr class was developed using Fann neural network libraries and the accuracy of the recognitions was very bad because of the lack of training of the neural network. That’s the point that I contributed to this project. After discussing with the previous mentors and the developers of this project I have used Tesseract Ocr engine which was an Google product to replace the Fann neural network.

I did some testing with the Tesseract for the hand written characters and check whether it is suitable for our module. The initial accuracy of the recognition with the Fann Library was below 30% and when it was tested with the Tesseract it was increased up to more than 80% and the worst case with was 80% when I tested for different data sets. I realized that it is a very powerful tool and selected it for the project. Then I have customized the Tesseract to make compatible with the expectations and changed the Form-processor in order to make it compatible with the Tesseract.

Finally the recognized letters are consolidated as strings and assign them to the appropriate labels such as Name, Address and Date of Birth. Then I have developed a GUI for the module which gives a graphical view for this process. Then the results are display at the GUIs result tab window. Then it allows the users to save the data in a file by submitting it.

So it was the greatest project that I was involved up to now because it gave me a lots of knowledge and it was an great opportunity made contacts with some interesting personalities involved in that project. I have an idea to work on this project more and extend this project to make the accuracy at least up to 97% and make the rest of the project as uploading the data in to the server directly kind of things. These some more ideas with me to work further on this project.

·Train the Tesseract more for hand written characters

·Increasing the accuracy by reading the labels. That means currently the OCR only reads the letters at the letter boxes and output the result. But in that case there are conflicts when recognizing the “0” and ”O” which the first one is zero and the second one is letter O. So when we read the label we can give a logic that the letter boxes under the Name label cannot have the zero number and also the letter boxes under the date of birth cannot have the letter O with it. So in that way using label reading we can further improve the accuracy.

·Making the data embedded with the Url given to the each form. That may make easy to transfer data from one place to another.

So please try on this project and try to make some innovative ideas to serve the people in the world in a situation that they are in need.

Wednesday, February 24, 2010

In last post I have describe the way to link your public_html folder to the www folder. To I will describe a simple way to link a folder to your /var/www to make easy your web developing stuff on linux. 1.To do that first you have to make a folder in your favorite place using mkdir command. mkdir /home/abc/myFolder

If you are working with any kind of web server on linux operating system you might have suffered enough from the warning massages saying “This is write disabled file, please make the file write enable”, “Cannot move the file in to www folder” so on. So is it a headache for you. Here is an great and simple solution for your problem.

In linux most of the situations you have to copy your files such as html and php, in to the /var/www/ folder to run them on your favorite web servers such as apache web server. But this folder is protected and you need some permission to copy, remove or change files in the www folder. So when we are involve in a web development using large number of files moving here and there it is difficult to work in this kind of environment.

So now you can use another folder that is away from www folder which is not a protected one as your www folder. It will help you to make changes in the folder as you wish and you can get the same functions as in the /var/www folder. Following are the steps of making a public_html folder and make it enable for access by your user-name as http://localhost/~ABC.

make a folder in your favorite place in your folder structure. But make sure it is not a protected folder. As an example make a folder in the user folder by using the following command in command prompt:

sudo mkdir /home/ABC/public_html/

Go to the folder of your web server. As an example if you are using apache2 web server you can use following command

cd /etc/apache2/

Then link your public_html folder to the web server.

sudo cp -r mods-available/userdir.* mods-enabled/

Then restart your web server

sudo /etc/init.d/apache2 restart

Now you are ready to use your public_html folder as your web servers working folder.

You can access the folder by typing the following address on the address bar.

The most important thing is if you are using your computer connected to a network and if others are access to your computer via network you can use this folder as the access folder to the out side, so the others can access to that folder with out permission enabled as well as you can keep your protected data away from this folder so others can't access those. Other wise if you enabled permission in the www folder the others may access to some other important data in your computer. The others can access to this folder by the address of your Internet address. As an example

Wednesday, February 3, 2010

Sahana is an open source disaster management system which is deployed to manage number of disaster situations all over the world.

Initially it was developed to manage the situation occurred in the Indian Ocean tsunami disaster which was occurred in the 26thof December in 2009. Lanka software foundations did the initial development of the Sahana phase-1 with the help of many organizations and the volunteers from various parts of the world. At that time there were no other free software which can support to handle all the situations as handling victims, managing camps, manging the volunteers, manging the NGOs like wise. So this was a better solution for all of this issues.

Initially Sahana contained 7 modules to help the management of the disaster. Those are