Monday, July 27, 2009

Reflections on the Netflix Competition

Thanks and Congratulations

1. First and foremost to Netflix for organising such a well designed competition. It was run in an exemplary fashion throughout and should, I believe, become the model for other competitions that people might choose to run. Some of the key features that made it such a success are:

a. A clear, unambiguous target and challenging target. How a 10% target was chosen, will I suspect, remain forever a mystery but it was almost perfect - seemingly unattainable at the beginning and difficult enough so that it took almost 3 years to crack - but not so difficult as to be impossible.b. Continuous feedback provided so one could identify whether the approaches you were investigating were going in the right direction.c. A forum so that the competitors could share ideas and help each other (more about that later).d. Conference sessions so competitors could meet and discuss ideas.e. Zero entry cost (apart, of course, from the contestant's time). f. A clear set of rules.

2. Brandyn Webb a.k.a. Simon Funk For early on giving away in complete detail one of (at the time) leading approaches to the problem, thereby opening up a spirit of co-operation between the contestants.

3. The contestants Despite the prize of a $1million dollars, the competition was conducted in a spirit of openeness and co-operation throughout with contestants sharing hints, tips and ideas in the forum, through academic papers and at the conference sessions setup to discuss approaches. This undoubtly helped us all progress, and made the process a whole lot more enjoyable.

4. And of course, the winners for driving us all forwards and keeping us targeted on trying to improve and getting to the target first. As all of us who tried, we know it wasn't easy.

Was the competition worth it?

There will, undoubtly be, some discussion about whether the science generated was worth the $1million plus untold researcher and other time trying to achieve the goal. I think the answer to this is unambiguously yes because:

a. The competition has trained several hundred, if not more, people how to properly implement machine learning algorithms on a real world, large scale dataset. I'm not sure how many people already have these skills, but I would be prepared to bet that the total pool of such ability has widened considerably. This can only be a good thing.

b. It has widened the awareness of machine learning techniques and recommender systems within the broader business community. I have had many,many requests frombusinesses asking how to implement recommender systems as a result of the competition and I guess other competitors have too. The wider non machine learning community is definitely looking for new applications (see my previous posts for some examples) and this can only be good for the field as a whole.

c. It has improved the science - I leave it to the academics to argue by how much, but it is certainly true that matrix factorization techniques have been the runaway success of this competition- Marrying such techniques with real-world understanding of the problem (incorporation, for example, of date and day effects) have provided by the far the most effective single technique - Such techniques, it seems to me, now need to be applied to a much wider set of problems to test their general applicability.

d. It has gifted the research community with a huge dataset for analysis as computer scientests, statisticians and I hope, from a personal perspective, as psychologists and behavioural economists too. It was a disappointment to me that I'm still the only contestant as far as I'm aware from a social sciences background. This is, almost undoubtly, the world's largest set of data on repeated decision making and ripe for analysis. The analysis may not win the competition, but it sure should provide some insights into the way that humans make decisions.

e. It was a lot of fun. I certainly enjoyed it, and I get the impression that most of the other contestants did too.

3 comments:

An excellent reflection Gavin, and as for how the 10% mark was chosen, the official reason is posted on the Netflix Prize FAQ page:

http://www.netflixprize.com//faq

"The RMSE on the test subset if you just predicted the average rating for each movie based on the training dataset is 1.0540. Cinematch’s RMSE on the test subset is 0.9525, a 9.6% improvement. We figured roughly to double that."