I am Pankaj Singhal from Jaipur, India. I am very much interested and strongly looking forward in getting involved in this project Neural Network.

My previous experience in this field is good. Last semester I did a similar job of ranking the URLs of the given huge dataset based on their attribute values but with decision trees. The dataset consisted hundreds of thousands of URLs and each url consisted of around 33000 features and a binary class label with +1 OR -1 value. I applied the Decision Tree induction(GINI INDEX) Approach for filtering out the URLs and then applying a RANKSUM[1] metric, which uses weighted sum approach, to rank the URLs accordingly.

Here is the idea which i want to incorporate and which would be a good extension to the Neural Network project and Orange.Firstly, regarding the surveying of available open-source implementations of neural networks, there are various open source libraries available(as mentioned in the other discussion)such as:

FANN[2]Libann[3]Flood[4]

More like above could be surveyed and implemented. But I would like to highlight the fact that the implementation of NN in these is not so recent and most optimum. Hence exploring the other option of building the NN from scratch, I suggest the following idea.

I want to implement the algorithm ListMLE[5] on Orange. The algorithm uses listwise approach with Neural Network as Model and gradient descent as algorithm(highly optimized Loss function). ListMLE is an extension of ListNET[6] which itself is an extension(somewhat) of RankNET[7]. This algorithm has shown better performance than the other two.Also the algorithm has linear complexity.

Regarding the features for the query-document pair, research has shown many good features that can be used for better tuning of the parameters of ranking function which can differentiate the documents in a better way. These can be calculated using the basic set of features(tf, idf, bm25, etc.), the more the better.

Regarding the training data we can use the OHSUMED[8] data-set, a benchmark data-set released in LETOR 2.0(Microsoft research), used by the developers of the algorithm for the training and testing purposes. This data-set is reliable as the relevance degrees of documents with respect to the queries are judged by humans. They try to adopt the ‘standard’ features proposed in the IR community. The similar kind of features, as used in data-set, can be incorporated while implementing the algorithm on Orange.