Sunday, October 29, 2006

More on Netflix contest

First, 27 days after the start of the contest, there are now 36 entries that beat the performance of Netflix Cinematch. The top entry already has a 5% improvement. It is still a long way from the 10% improvement required to win the grand prize of $1M, but the gap is closing.

Second, there is a fun post in the Netflix contest forums by Benji Smith about the "most hated", "most loved", and "most contentious" popular movies according to the Netflix data. [Found via kottke.org]

Third, Netflix's VP of Recommendation Systems Jim Bennet gave a Sept 2006 talk (PDF) about their Cinematch recommender system. The talk mentions that Cinematch uses an item-to-item algorithm -- the same type of algorithm used by Amazon.com's recommender system -- and includes some nice tidbits such as the characteristics of movies that they can accurately predict. At the end of the talk, Jim provides some justification for why Netflix is spending $1M on this contest, saying that higher quality recommendations are "absolutely critical to retaining users." [via Recommenders06]

Does anyone know Netflix decided the train/test split? It would be interesting to know how they selected this to ensure the right winner is picked (difficult if the holdout is very small).

For Netflix, it's an interesting question. The absolute number of records held out is large (3 mil). However, the relative size of the holdout is small (~3%), especially in lieu of the number of parameters many recommenders fit.