How the Netflix Prize Was Won

This equation, from Yehuda Koren's prize-winning documentation, shows the winning team adding a third set of movie-movie weights, and emphasis on adjacent ratings made by a user.

Like BellKor’s Pragmatic Chaos, the winner of the Netflix Prize, second-place The Ensemble was an amalgam of teams which had been competing individually for the million-dollar prize. But it wasn’t until leaders joined forces with also-rans that real progress was made in the contest’s goal to improve the Netflix movie recommendation algorithm by 10 percent.

Besides the sheer thrill of seeing the winner place first by mere minutes after years of work, the Netflix Prize competition has proffered hard proof of a basic crowdsourcing concept: Better solutions come from unorganized people who are allowed to organize organically. But something else happened that wasn’t entirely expected: Teams that had it basically wrong — but for a few good ideas — made the difference when combined with teams which had it basically right, but couldn’t close the deal on their own.

The secret sauce for both BellKor’s Pragmatic Chaos and The Ensemble was collaboration between diverse ideas, and not in some touchy-feely, unquantifiable, “when people work together things are better” sort of way. The top two teams beat the challenge by combining teams and their algorithms into more complex algorithms incorporating everybody’s work. The more people joined, the more the resulting team’s score would increase.

“It’s been quite a drama,” said Netflix chief product officer Neil Hunt at Monday’s awards ceremony. “At first, a whole lot of teams got in — and they got 6-percent improvement, 7-percent improvement, 8-percent improvement, and then it started slowing down, and we got into year two. There was this long period where they were barely making progress, and we were thinking, ‘maybe this will never be won.’

“Then there was a great insight among some of the teams — that if they combined their approaches, they actually got better. It was fairly unintuitive to many people [because you generally take the smartest two people and say ‘come up with a solution’]… when you get this combining of these algorithms in certain ways, it started out this ‘second frenzy.’ In combination, the teams could get better and better and better.”

Ironically, the most outlying approaches — the ones farthest away from the mainstream way to solve a given problem — proved most helpful towards the end of the contest, as the teams neared the summit.

For instance, BellKor’s Pragmatic Chaos (methodology here) credits some of its success to slicing the data by what they called “frequency.” As it turns out, people who rate a whole slew of movies at one time tend to be rating movies they saw a long time ago. The data showed that people employ different criteria to rate movies they saw a long time ago, as opposed to ones they saw recently — and that in addition, some movies age better than others, skewing either up or down over time. (Finally, someone has explained why Snakes On A Plane seemed more fun at the time than it does now.)

By tracking the number of movies rated on a given day as an indicator of how long it had been since a given viewer had seen a movie, and by tracking how memory affected particular movie ratings, Pragmatic Theory (later part of the winning team) was able to gain a slight edge, even though this particular algorithm isn’t particularly good at predicting which movies people will like when run on its own.

Another example: According to Joe Sill of The Ensemble, Big Chaos (the Austrians who also became part of the winning team) discovered that viewers in general tend to rate movies differently on Fridays versus Mondays, and certain users are in good moods on Sundays, and so on. The team essentially devised a three-dimensional model that incorporated time into the relationship between people and movies.

Taken on its own, the fact that a viewer rated a given movie on a Monday is a horrible indicator of what other movies they’ll want to rent — a crucial part of Netflix’ business (it says its recommendations are better indicators of what people will rent than their “most popular” lists). But combined with hundreds of other algorithms from other minds, each weighted with precision, and combined and recombined, that otherwise inconsequential fact takes on huge importance.

“One of the big lessons was developing diverse models that captured distinct effects,” said Sill, “even if they’re very small effects.”