Monday, February 25, 2008

Over two years ago, Netflix announced a Recommendation Engine contest - anyone who invents an algorithm that does 10% better thantheir current recommendation system will win $1 Million dollars. Many research teams raced to attack the problem, excited bythe unprecedented amount of data available. Initially quite a lot of progress was made, but then slowlythe progress stalled and now teams are stuck at around the 8.5% improvement mark.

In this post we argue that the improvement in recommendation engines is not an algorithmicproblem, but rather a presentation issue. Respinning recommendations as filters and delivering them withoutsetting high expectations is more likely to yield progress than crunching more data faster.

Building a recommendation engine is a complex endeavor, which wediscussed here a year ago.But in addition to being a technical challenge, there are also fundamental psychological questions: do peoplewant recommendations and if so, then when are they open to them? Perhaps an even bigger question is:what happens when the user receives one or more bad recommendations? How tolerant will they be?

Genetics of Recommendation Engines

All recommendation engines are trying to solve the following problem: given a set of ratings for a particular user,along with those of the whole user base, come up with new items that this user will like. There are many algorithms that can beapplied to the problem, but all of them focus on three elements: personal, social and fundamental:

Social recommendation - recommend things based on the past behavior of similar users

Item recommendation - recommend things based on the item itself

A combination of the three approaches above

A social recommendation is also known as collaborative filtering - people who liked X also like Y. For example, people who likedLord of The Rings are likely to enjoy Eragon and The Chronicles of Narnia. The problem with this approach is that peoples tastes do not in reality fall into simple categories. If two people sharethe same taste in fantasy movies, it does not mean that they will also both like dramas or mysteries. A good wayto think about this problem comes from genetics. Many times we meet people who have features that we recognizeand have seen in others. For example, eyes might look familiar, or lips, but it is a totally different person.

The other kind of recommendation is an item-based recommendation. The best example of this system is the Pandora musicrecommendation service. It works by ranking each musical piece by more than 400 different characteristic - musical genes.It then automatically matches the pieces based on these characteristics. There are challenges with tuning the algorithm towork well, but it is also challenging to apply it to other verticals. For movies, for example, you'd need to come upwith ranking each movie along many scales, starting from director, cast, plot; and then obscure things likemusical score, locations, light, camera work, etc. It certainly can be done, but this is complicated.

The Guy In The Garage

The complexity of the recommendation problem is due to its vast space of possibilities. Much likeit's hard to figure out which exact gene is responsible for a particular human trait, it is hard to figure outwhich bits of the movie or music make us rate it as 5 stars. Reverse engineering human thinking is hard.Which is exactly why one of the contestants highlighted in the Wired article is relying on a very different trickto make his algorithm work.

Nicknamed Guy In The Garage, Gavin Potter from London is relying on human inertia. Apparently, the ratingof the movie depends on the ratings of previous movies that we just saw. For example, if you watch three moviesin a row and rate them with 4 stars, and then watch the next one which is slightly better, you will rate it 5.Conversely, if you rate three movies in a row with 1 star, then the same movie that you would otherwise rate as 5 wouldonly get 4 stars from you.

Just when you think that this is not true, you will discover that this algorithm now sits in the 5th place and stillis making progress, while other algorithms are spinning. Enhancing formulas with a bit of human psychology is a really good ideaand this is where we turn next.

Replacing Recommendations with Filters

How many times has this happened to you: a friend recommended you a movie ora restaurant, so you went there all excited - but ended up disappointed? A lot!It is obvious that hype sets the bar high, increasing the chances of a miss.In math speak, this kind of miss is known as a false positive. Consider now what would happenif instead of recommending a movie, a friend tells that you are not going to like certain movie,so do not bother renting it.

What bad can come of that? Not much, because likely you are not going to watch it. But even if you doand you like it, you are not going to be experience negative feelings. This example demonstrates the difference between ourreaction to a false negative and a false positive. False positives upset us, but false negatives do not.The idea of respinning recommendations as filters is about leveraging this phenomenon.

When Netflix makes recommendations, it sets itself up for a sure failure. Sooner rather than later it is goingto miss and recommend you a movie that you are not going to like. What if instead of doing that, it would show you new releasesand have a button: filter the ones I am not going to like. The algorithm is the same, but perception is different.

Filters in Real-Time Culture

And this idea becomes increasingly important and powerful in the age of real-time news. We are increasingly orientedtowards continuously filtering new information. We do this with our RSS Readers everyday. We think of the world in termsof streams of news, where things of the past are not relevant. We do not need recommendations, because we are already over subscribed.We need noise filters. An algorithm that says: 'hey, you are definitely not going to like that' and hide it.

If the machines can do the work of aggressively throwing information out for us, then we can deal with the rest on our own.Borrowing from the spam box in emails, if all the tools around us had a button that said 'filter this for me', and maybe evenhad a mode where such a filter is on by default, we'd all to get more things done.

Conclusion

Building a perfect recommendation engine is a very complex task. Regardless of the method, collaborative filteringor inherent properties of things - recommendations are an unforgiving business, where false positives quickly turn users off.Perhaps applying psychology to the problem can make people appreciate what these complex algorithms are doing. If instead ofrecommending things, machines would filter things we definitely won't like, we might be more forgiving and understanding.

Now tell us please about your experiences with recommendation engines. Were there ones that worked really well?Would you be open to filtering instead of recommendation? Besides movies and news, where would you like to have these filters?