1. Text mining add-on for Orange I have used many data mining tools (including orange) and found out that a lot of functionality to process text data can be introduced.like stemming , parsing (regex search) , phrase detection. Also i have experience of working over data mining projects (classification and clustering) that deals with processing text data.

Think about concrete proposals on how to contribute to Orange for GSoC. You can also already code something and show us that so that we can see your ideas and how you work. If you are really interested in all this you can contribute to Orange anyway, with out without your participation in GSoC.

HI I really want to contribute to orange but i am not understanding how to get started. I have downloaded the source code and build it, but i am not understanding what feature should i implement first (with regard to Text mining project)So please guide me.

you should try to use our text-mining add-on and I am sure you will find some room for improvement. We welcome new features, but we would also like to refactor existing text-mining add-on. There are many features implemented, but some of them probably don't work with current version of Orange, some aren't documented and some don't have widgets.

Thanks for the reply. i was going through the documentation of the text mining add-on and found out that the add-on lags a lot on documentation part. Many pre-processing functions( like stemming , having a different scoring function rather then tf*idf) can be implemented.But i am finding it very hard to use the current text mining add-on, please provide me some help in this regard.

Harshit, one of very important aspects of our candidates is also independence. As we have explained to you, text mining add-on is in a poor state. This is why we are proposing that some student take care of it. Of course this means he or she should be capable of mostly independently get a grasp of existing code and then propose a path for its improvement. If you need so much help as you are requesting all the time, then maybe you are not the candidate we are searching for. If you just try to engage the community, then the much better way is to start proposing patches of code than to ask for the very generic help ("please help me"). You should be much more precise, if you want any help. And also, as I explained above, you should maybe try to solve problems yourself and prove to us that you are capable of solving problems yourself.

Sorry i may have not been precise in asking the type of help needed, but will take care of it from now on. I have also read the source code of the text mining add-on and tried to integrate it with Porter Stemmer. I have already sent a pull request.I am willing to contribute from my side and may be in doing so i have asked for more help but will take care of it from now on.

Along with exploring the add-on for text mining of orange, I have been exploring different tools available for text mining (example lucene etc) and found out different features that we can include, they are as follows

1. Include parsing of different file format : currently only xml file format parsing is available , we can include parsing of json , html and other file formats

2. Storing the index created for a file in some index file, so that user can use it in future and no need to index the file again and again.

3. Providing more detailed documentation and making a tutorial for text mining add-on

Please provide me some feedback on how useful will this changes be and what more i can include.

I can only speak about time series idea. As we do not have anything specific for that currently, any basic/initial support for time series analysis would be good. Keep that in mind while doing a proposal (if you decide for time series). What would also be important for us that we find some way of integrating time series with the rest of the Orange, for example to have some widget which would find some features we could work on with existing widgets.