My name is Flavio Figueiredo and I am interested on developing the support for parallel computation for scripting/backend during Gsoc 2011. I have experience on the parallel and distributed programming, having been a developer of grid middle wares in the past.

Currently I perform social networks and IR research, being orange one of the tools I use the most during the last months. In fact, I currently employ some simple parallel computation with Orange + iPython. If you wish to know any more information about myself, please check out my website and curriculum (which is on the website) or just ask me.

I am currently in the process of writing the proposal for this. Basically I can clearly see a parallelization of Orange.evaluation.testing (used to be orngTest), cross validation and repetitions as stated in the project idea, but many other aspects of orange can be parallelized. For example:

* When dealing with multiple datasets, any analysis can be parallelized between datasets, since they are independent.* Distance calculations between distinct pairs of examples can also be parallelized* Different runs of k-means on clustering* Parameter selection from Orange's orngWrapamongst others..

The proposal can be written considering one module only, such as orngTest, or maybe a more complex (I'm not sure if this is your idea for summer of code, it can take more time) solution for enabling the parallelization of any completely independent tasks (such as the ones above). This other idea would be to implement a Workqueue for functions which could run in parallel without affecting one another. We would have to:

* Identify such functions (pre-processing is definitely not one of them, unless the ExampleTable is cloned.)* Implement asynchronous method calls, in which the user would submit tasks to the Workqueue and wait for results accordingly.

#I know some methods do not represent the current API, this a simple idea sketch-------

I am not sure if the description was clear, but in summary would like to know if: 1) there is a priority of modules to be parallelized (I personally like orngTest since i use it the most ), 2) or if the idea is to have something more generic, 3) or if this is to be studied during the summer of code.

Modules listed in the proposal are definitely those who we would like to see prioritized (as those are the modules we have in practice wished to have parallelization). But yes, everything else is also welcome. Just not over-plan yourself. Summer is short.

I would say that implementation should be generic, with an example use of in the idea proposal mentioned modules, but if you have time also on others.

This should be studied mostly now, this is your home-work for the proposal. Of course we can and will refine it together, if you will be accepted. And you can now ask questions if you have any.

I am not sure about your proposed API sketch, will tell others to look into that and maybe comment.

Users should be able to take advantage of parallel computation without explicitly saying what should be run in parallel. They should be able to enable parallel cross-validation in existing scripts with a single line of code. Something like "Orange.core.enable_parallelization(no_of_cores=4)" in the beginning of the script should enable all calls to Orange.evaluation.testing.cross_validation(...) to run folds in parallel.

The API you described seems reasonable enough to use. Keep in mind that for parallelization of parameter selection you would have to run several cross-validations (each running in parallel) in parallel, so API should support this.

Unfortunately I will not be able to dedicate as much time as I would have wished to Gsoc. Yesterday I received some exam (not medical, academic) results and they were worse than expected. Since summer up north is not summer in Brazil, the time for Gsoc is not necessarily vacations for me. Also, I am visiting another university and my time would already be split. After much thought, I decided to review my priorities and leave Gsoc out this year.

I hope some one else get's this project and I am still an enthusiastic user of Orange =) Maybe I can still develop this project (or help some one) but not for Gsoc, since you and google would probably expect more effort than I can allocate.