I’m an Associate Professor at Moscow State University. Participating in Kaggle challenges is giving me a lot of valuable experience. I write popular scientific lectures about data mining. In the lectures I tell about my experiences. For example, Introduction to Data Mining and Tricks in Data Mining (both in Russian).

What made you decide to enter?

In the last three competitions, I took the first, third and fourth places. Therefore I looked for a competition to take the second place. 🙂 And I found it!

What preprocessing and supervised learning methods did you use?

My approach was to reduce this problem to a standard classification problem. I generated feature description of every pair “student – question”. I used pairs from valid_test.csv for tuning the algorithms. Here are some examples of features: an average student score, an average student score today, his time of the answering, the weighed average score (with different weighted schemes), the question difficulty, the question difficulty today, etc. There were also some features from SVD. I also added some linear combinations of the features (which increased performance). I blended GBMs (from R), GLM (from MATLAB) and neural nets (from CLOP library in MATLAB).

What was your most important insight into the data?

Nothing, I solved it as a standard classification problem and did not look at the data.

Were you surprised by any of your insights?

I was surprised that Random Forests were essentially worse than GBMs and didn't increase performance in blending.

Which tools did you use?

R and MATLAB (with CLOP library)

What have you taken away from this competition?

I really liked the winner’s method. And I should admit that the method is more effective than my method. But when I solved the problem, I checked a hypothesis that it could be solved as a usual classification problem. I think that my hypothesis has proved to be true.

Any chance he'll be planning to translate his two papers to English? I'd love to dig into his work.

Vladimir Nikulin

Many thanks, Alexander, for your texts in Russian: it is particularly useful in the sense of terminology for myself and for my students at the Vyatka State University.
Your results are very impressive! Congratulations!
By the way, I have the same result {1-4-2-3} on the TunedIT platform (true, the sequence is a little different).

Margit Zwemer

Hi Vladimir,

Any chance one of your students might be interested in translating the papers into English for the non-Russian-speaking section of the Kaggle community?

Sounds like Vladimir has responded, but maybe it would be worthwhile to see if we can pool $ to have a 3rd party translate? I'd be willing to toss in $10 to read both documents -- maybe others would as well.

Vladimir Nikulin

No, Margit, the chances here are very slim. Until the end of this year my students will be totally busy working in an opposite direction.